Jan Wright on Creating ePUB Indexes in InDesign
On Wednesday, March 7, 2012, Jan Wright (@WIndexing) was a guest during #ePrdctn Hour on Twitter. Wright is the owner of Wright Information Indexing Services and has been indexing and taxonomizing since 1991. She is currently co-chairing the ASI’s Digital Trends Task Force, focusing on eBook indexes, and was part of the team that wrote the IDPF EPUB 3 Indexes Working Group Charter Document to implement indexes in EPUB. Wright shared information on the InDesign scripts she has been working on with Olav Martin Kvern for creating indexes in ePUBs. Wright also was a guest on the eBook Ninjas podcast Episode 68 this week, where she discussed eBook indexing and the IDPF Indexes Charter with the staff of eBook Architects.
Wright was generous enough to go into more detail with ePUBSecrets about her work creating indexes for ePUBs in InDesign.
The scripting for InDesign will be released under Creative Commons. Olav Martin Kvern developed the scripts, and once we finish this test book, we will figure out where to host the scripts, and document how they work.
One adds hyperlink text destinations to each paragraph, and one midway if the paragraph is long. Each of these has an inserted note by it, showing the contents of the link.
One checks the destinations to make sure that the page number they picked up and used as part of the link is the page that they are on.
One removes all the notes and links if things are messed up.
One checks the index file and matches the links used in the index file to actual links in the InDesign book, and lists any errors so you can fix them.
And then there is a GREP routine that takes a placed index that has hyperlinks in its locators, and strips out all the code so that it looks like a plain old index, for print purposes.
There’s a bit of a workflow that needs to be clear when you use the set of scripts, which we have been figuring out as we go along. Of course, it’s for eBooks, it can’t be simple, right?
What is cool, though, is that once the scripts have run on the book, nothing else happens to the InDesign book files. So no layout changes, no messing up fonts, no accidental image repositioning, nada, just the insertion of hyperlinks and a note at each hyperlink that shows the link’s address.
There’s advantages and disadvantages to doing it this way.
One advantage is that, until InDesign decides to stop stripping out the index markers in the ePUB export, this is one way to get eBook indexes that really work, and will leap to the paragraph level. We may also be creating the predecessor of what might be needed once the Working Group finishes its work. We’ll see what the final implementation is, but I am suspecting a separate index file may be supplied with an ePUB and marked as such in the navigation file.
We also get to:
- Sort the indexing however it needs to be (word-by-word or letter-by-letter). Yes, there are differing systems for how to alphabetize. InDesign’s default doesn’t follow either system, so we have to force it to sort correctly. In our own software, we can set it and forget it.
- See faster delivery time for the index. (Our software makes indexing much, much faster, and much easier on the carpal tunnel. InDesign’s index entry dialog box is so mouse heavy, as is the palette. And it is slow to open and close that box for every entry. There should be a separate floating toolbar instead.)
- Use bold and italics text in the index terms themselves. InDesign bolds the page number if you want, but the lack of bolding and italics in the phrasing has been an issue.
- And the indexing process is not freezing your files for as long as they would be if you have someone embedding in InDesign because we are using our own software, which is much faster. The files are frozen to run the scripts, we check them to see if a few more hyperlinks might be needed, and then they can go back to the publisher. (But major text rolls to other pages should be avoided.)
The disadvantages are that if you want your print index to exactly match your eBook index, you need to freeze your InDesign pages so that there is no text rolling after indexing begins. We aren’t using InDesign’s embedded markers to generate the indexes, so it won’t update page numbers automatically. This freezing is hard on publishers, but the benefits of having an index working for both print and eBook may outweigh it. (And our own software does allow us to sort in locator order, so we can adjust things if need be for minor rolls. We usually charge extra, though.)
The other disadvantage is that you need professional indexing software.
Professionals use all kinds of packages to index when they don’t need to embed markers, but there are three big ones: Cindex, Sky, and Macrex.
I’m a Cindex user, simply because it has always had the capability to export and import tabbed delimited files that work in all kinds of applications, and it is easy to use. The InDesign scripts don’t care which program you use, but you have to be able to output a styled Word index file with the hyperlink locators ready to be converted into ePUB or Mobi (or KF8). You also need to be able to export a file with a unique ID for an index entry, and the locator link so it can be checked by the error-finding script. All of these packages can do that.
This is a workaround, but in some ways, in the future, indexing needs to be out-of-the-files, and unique IDs need to be there. As long as there are unique paragraph IDs in any kind of text, XML, Word, InDesign, whatever, indexing can be done using the IDs and not have to interfere with the files. As long as we can link the entries to the location, we can generate an index. The beauty of having a linked relationship is that we can change both sides of the bond: rephrase on this side after thinking about it, update phrasing to reflect new ideas. I think we’ll see a system where you call the index, so that people can still work on terminology until the call comes.
I’ve done a ton of embedded indexing in Word, Frame, PageMaker, and InDesign, and the hardest part is the primitiveness of the work process when you embed using those modules.
Compared to our own packages, it is slow, laborious, limited, and painful.
I actually prefer Word and Frame, because we have add-ons for those that allow us to work faster, but for some reason good indexing add-ons haven’t been built for InDesign, ones that replace the palette with something that works more like a database because index entries are basically database records, and we are generating reports. That kind of focus hasn’t come in InDesign tools. And we like our own packages because you almost never need to use a mouse—everything has key commands.
What do you think of Wright’s proposed indexing workflow? What are you using to index ePUBs?