Jan Wright on Creating ePUB Indexes in InDesign
On Wednesday, March 7, 2012, Jan Wright (@WIndexing) was a guest during #ePrdctn Hour on Twitter. Wright is the owner of Wright Information Indexing Services and has been indexing and taxonomizing since 1991. She is currently co-chairing the ASI’s Digital Trends Task Force, focusing on eBook indexes, and was part of the team that wrote the IDPF EPUB 3 Indexes Working Group Charter Document to implement indexes in EPUB. Wright shared information on the InDesign scripts she has been working on with Olav Martin Kvern for creating indexes in ePUBs. Wright also was a guest on the eBook Ninjas podcast Episode 68 this week, where she discussed eBook indexing and the IDPF Indexes Charter with the staff of eBook Architects.
Wright was generous enough to go into more detail with ePUBSecrets about her work creating indexes for ePUBs in InDesign.
The scripting for InDesign will be released under Creative Commons. Olav Martin Kvern developed the scripts, and once we finish this test book, we will figure out where to host the scripts, and document how they work.
One adds hyperlink text destinations to each paragraph, and one midway if the paragraph is long. Each of these has an inserted note by it, showing the contents of the link.
One checks the destinations to make sure that the page number they picked up and used as part of the link is the page that they are on.
One removes all the notes and links if things are messed up.
One checks the index file and matches the links used in the index file to actual links in the InDesign book, and lists any errors so you can fix them.
And then there is a GREP routine that takes a placed index that has hyperlinks in its locators, and strips out all the code so that it looks like a plain old index, for print purposes.
There’s a bit of a workflow that needs to be clear when you use the set of scripts, which we have been figuring out as we go along. Of course, it’s for eBooks, it can’t be simple, right?
What is cool, though, is that once the scripts have run on the book, nothing else happens to the InDesign book files. So no layout changes, no messing up fonts, no accidental image repositioning, nada, just the insertion of hyperlinks and a note at each hyperlink that shows the link’s address.
There’s advantages and disadvantages to doing it this way.
One advantage is that, until InDesign decides to stop stripping out the index markers in the ePUB export, this is one way to get eBook indexes that really work, and will leap to the paragraph level. We may also be creating the predecessor of what might be needed once the Working Group finishes its work. We’ll see what the final implementation is, but I am suspecting a separate index file may be supplied with an ePUB and marked as such in the navigation file.
We also get to:
- Sort the indexing however it needs to be (word-by-word or letter-by-letter). Yes, there are differing systems for how to alphabetize. InDesign’s default doesn’t follow either system, so we have to force it to sort correctly. In our own software, we can set it and forget it.
- See faster delivery time for the index. (Our software makes indexing much, much faster, and much easier on the carpal tunnel. InDesign’s index entry dialog box is so mouse heavy, as is the palette. And it is slow to open and close that box for every entry. There should be a separate floating toolbar instead.)
- Use bold and italics text in the index terms themselves. InDesign bolds the page number if you want, but the lack of bolding and italics in the phrasing has been an issue.
- And the indexing process is not freezing your files for as long as they would be if you have someone embedding in InDesign because we are using our own software, which is much faster. The files are frozen to run the scripts, we check them to see if a few more hyperlinks might be needed, and then they can go back to the publisher. (But major text rolls to other pages should be avoided.)
The disadvantages are that if you want your print index to exactly match your eBook index, you need to freeze your InDesign pages so that there is no text rolling after indexing begins. We aren’t using InDesign’s embedded markers to generate the indexes, so it won’t update page numbers automatically. This freezing is hard on publishers, but the benefits of having an index working for both print and eBook may outweigh it. (And our own software does allow us to sort in locator order, so we can adjust things if need be for minor rolls. We usually charge extra, though.)
The other disadvantage is that you need professional indexing software.
Professionals use all kinds of packages to index when they don’t need to embed markers, but there are three big ones: Cindex, Sky, and Macrex.
I’m a Cindex user, simply because it has always had the capability to export and import tabbed delimited files that work in all kinds of applications, and it is easy to use. The InDesign scripts don’t care which program you use, but you have to be able to output a styled Word index file with the hyperlink locators ready to be converted into ePUB or Mobi (or KF8). You also need to be able to export a file with a unique ID for an index entry, and the locator link so it can be checked by the error-finding script. All of these packages can do that.
This is a workaround, but in some ways, in the future, indexing needs to be out-of-the-files, and unique IDs need to be there. As long as there are unique paragraph IDs in any kind of text, XML, Word, InDesign, whatever, indexing can be done using the IDs and not have to interfere with the files. As long as we can link the entries to the location, we can generate an index. The beauty of having a linked relationship is that we can change both sides of the bond: rephrase on this side after thinking about it, update phrasing to reflect new ideas. I think we’ll see a system where you call the index, so that people can still work on terminology until the call comes.
I’ve done a ton of embedded indexing in Word, Frame, PageMaker, and InDesign, and the hardest part is the primitiveness of the work process when you embed using those modules.
Compared to our own packages, it is slow, laborious, limited, and painful.
I actually prefer Word and Frame, because we have add-ons for those that allow us to work faster, but for some reason good indexing add-ons haven’t been built for InDesign, ones that replace the palette with something that works more like a database because index entries are basically database records, and we are generating reports. That kind of focus hasn’t come in InDesign tools. And we like our own packages because you almost never need to use a mouse—everything has key commands.
What do you think of Wright’s proposed indexing workflow? What are you using to index ePUBs?
Long-time readers may remember Jan from this blog post, too: http://indesignsecrets.com/real-world-indesign-cs4-wins-best-index-award.php
[…] If you’re making EPUBs, you should definitely check out EPUBsecrets.com! For example, here’s an article on making indexes in epubs. […]
The scripts talked about above are out and ready for people to experiment with. Check out http://www.wrightinformation.com/Indesign%20scripts/Indesignscripts.html for full details and links.
Most of the publishers we work with require human-generated indexes, and then want a linked index when the epub is created, with links from the page numbers to the actual page. So I don’t think this scripting works for us, right? I’m looking to see if there’s a script that will put an destination text anchor at the top of each page of the InD file, and then create a link to that anchor from the page number listed in the index. Yes, odd to have page numbers in an epub that don’t have a corresponding page number, but users will get it I think.
Hey David,
These scripts are designed for use by humans. If the indexers you are using work in Sky, Cindex, or Macrex, the scripts will work for you. Those are indexing programs that output index files, and it is simply a matter of substituting a href links for simple page numbers. Or you can use anything as the display for the link. It’s easiest right now to use a page number if you want to match a print edition of the book. The destination text anchors are put into every paragraph, not just every page top. We’ve found in ereading, page tops mean you can be off by as much as 4 screens on an ereader.
User should still get page numbers for a while. They do show how dense a subject is covered, and for how long, so they still convey some information. [1], [2], [3] systems don’t tell me as much as 7, 8, 11-14, 92 does.
Please feel free to contact me for information on the scripts.
Thanks Jan. I’ll check this out (yes, the professionally indexers use Cindex; unfortunately often the authors are doing them (even for the university presses we work with!)
I have a Indesign, which need to be convert E-pub with index number to Exact location and Locator word to Index position. But Index is generated from the Indesign only. Which has the hyperlink of the entrys.
Please advise.
Hi Dinesh,
If you are using InDesign CC, and you have used InDesign’s indexing module to insert index entries, you can compile the index and lay it out as a chapter of the book. Then, when you convert the book to EPUB, the index will be hyperlinked back into the text, to the point where the markers were inserted. CC is the only version that does this.
LiveIndex does exactly that: http://www.id-extras.com/products/liveindex
It will convert all the page numbers in a preexisting index (human-generated) into clickable hyperlinks. If the target output if ePub, it adds text anchors to the top of all references pages and links the index to those. If the target output is PDF it simply links straight to the page itself.
I enjoy, cause I found exactly what I was looking for.
You’ve ended my four day lengthy hunt! God Bless you man.
Have a great day. Bye
[…] “Jan Wright on Creating ePUB Indexes in InDesign” on EPUB Secrets […]
Traiteur Rabat Regal; Traiteur de ronome au Maroc
Traiteur Rabat Regal au Maroc