Indexes in ebooks: Part 3

  • Sumo


Steve Ingle of WordCo completes his deep dive into indexing for ebooks. Steve will take part in #eprdctn hour on Twitter next Wednesday, September 9, 2015, at 11AM (EST). If you miss it, see the Resources page for the Storify compilation.

Why don’t all digital nonfiction books have hyperlinked indexes? If publishers are serious about asking readers to transition to digital books, shouldn’t there be indexes, and shouldn’t they be at least as good as, if not BETTER than, their print-book equivalent? If you don’t think the answer is “YES!!!,” you can stop reading right now.

Let’s look at three indisputable facts that I have addressed in earlier blog posts:

  1. We live in a world where nonfiction books often coexist in print and ebook formats. We’re going to live with this duality for the foreseeable future, at least until the next “game-changer” comes along.
  2. The digital version of a nonfiction book, to its detriment, often lacks an index, or includes a simple (non-hyperlinked) reproduction of the print index. The reason for this is often ignorance (“ebook users don’t need indexes because they can just search”), or perceived or actual constraints (“it’s going to cost too much to add hyperlinks” or “it’s just going to take too long”).
  3. Creating an index is a very labor-intensive and convoluted process, especially for longer, complex publications. Anyone who has tried to index a book knows that what at first appears to be a simple process (hey, I’m just choosing words and putting them in alphabetical order) quickly bogs down into a soupy mess. It takes real skill and a lot of work to create a good index. Asking indexers to insert tags on top of what they already do is likely to drive them over the proverbial edge.

Based on these facts, it is fair to conclude that if we want to have consistently good quality indexes in digital nonfiction books, we need a way to include a hyperlinked index that’s not going to take the indexer appreciably longer to create than it takes to create the print-book index alone.

The key is to eliminate tagging by the indexer (we’re not taggers, we’re INDEXERS, dammit!), and replace it with an enhanced version of what the indexer already does when print-book indexing. Remember, physical tagging may not be too bad for a fairly short non-fiction trade book, but quickly becomes overwhelming with a highly detailed textbook or professional book where there is a massive amount of content on every page.

Here’s one potential solution

Suppose the document being indexed has unique numbers (IDs) assigned to each element (heading, paragraph, bullet point, etc.). The level of granularity could be predetermined (e.g., it can be specified that any paragraph over 100 words will include IDs). Now suppose the indexer does her indexing using her normal method, but she also adds the unique IDs into the indexing software along with the page locator. Because most indexing software will automatically complete the page reference information from the previous record, the indexer does not have to re-key the entire ID.

The indexer is still indexing the book ONCE, along with adding some additional information (the element ID) to each index record within the indexing software.

What’s so great about this? Well, we now have a database file (the index with its various headings, subheadings, and page references), that can be linked to the ePub (or other) file that contains the unique IDs. With a few simple scripts, we will have an index that can be added to the book file(s) and hyperlinks to specific locations within the digital book.

Now even this process, while much easier than tagging, still takes some getting used to for the indexer. We at WordCo have learned through our mistakes. But this process can mesh with publishers’ existing workflows and produce outstanding results.

Solving the problem of getting quality hyperlinked indexes into nonfiction ebooks in a way that meshes with publishers’ workflows AND puts realistic demands on indexers is Goal #1.

This is just the first step in how indexes can encourage the migration of users from print to digital.

It’s useful to remember that when Gutenberg invented the printing press in the mid 1400s, his goal was to come up with a more efficient way of copying pages from the Bible. The resulting spread of literacy, leading to the Reformation and growth of democracy, was certainly not Gutenberg’s immediate aim−he was just trying to solve a problem. But his invention was indeed revolutionary.

Are ebooks truly revolutionary?

I’m not sure, but envisaging digital books as simply a digital version of the print book (and nothing more) does a disservice to their potential.

The purpose of nonfiction books is learning, whether we’re talking about a “light” business book on sales techniques, or a “heavy” textbook on differential equations.

Anything that facilitates learning has the potential to create a more rewarding experience for users, especially if it does so in the digital format in a way the print book can’t do, or do well. The index can be an integral part of this.

Going outside of the box

Here’s one example of how the above-described technique for creating hyperlinked indexes might be used to enhance the learning process for users of digital nonfiction books.

Suppose, as the indexer of a business book on leadership, I decide to apply unique labels (which I can easily do with my indexing software) to various categories of index entries (for example, people, companies, strategies).

Since I have an index file linked to the text via unique IDs, it would be fairly simple to use scripts to add tags that include this additional information. If my reading platform could utilize this information, perhaps the end-user could instruct the index to display all people’s names in red, all companies in blue, and all strategies in green.

Of course current platforms don’t yet do this. But if it’s feasible, and the need could be demonstrated, why not?


Finally, what about ePub3 and its indexing specification? While ePub3 does address indexes and indexing, its spec attempts to create “standards” for digital book indexes without really addressing workflow concerns. As they sometimes say in New England,

You can’t get there from here.

In other words, it all sounds great, but how do we indexers honor the specs and still make a buck? Besides, I wouldn’t bet cash that ePub3 will be the standard 20 or even 10 years from now; it could be HTML5 or something completely different.

The real, long-term challenge, beyond solving existing production concerns, is to create innovative prototypes of what digital indexes and digital “books” could look like. Why remain stuck in the back-of-the-book model of the alphabetical, static, but perhaps hyperlinked, index? I believe the game-changer is out there, if only in someone’s imagination.

I hope that these three blogs posts have stirred things up. I hope they will encourage a fruitful dialogue on indexes in ebooks, and the role of indexes in general. Indexers, compositors, designers, editors, software developers, even the budget and marketing people: we all have a role to play in coming up with solutions. Let’s work as a team and keep the conversation going!

Stephen Ingle is the president and CEO of WordCo Indexing Services (, located in Norwich, Connecticut.  He created his first index (8 lines) at the age of 10. After graduating from Yale University with a degree in German literature, he went on to earn master’s degrees in German and Russian Area Studies.  In 1988, Steve began freelance indexing part time while also working at the Modern Language Association (MLA) in New York.  He began indexing full time in 1991. Steve has served on the national board of the American Society for Indexing. His company now employs a team of indexers and completes about 500 projects annually for a diverse group of clients.  His interests include indexing as a business and indexes for digital publications.


One Response to “Indexes in ebooks: Part 3”

  1. Patricia Erickson says:

    I have read your 3-part series about indexing book, and it is very interesting. I think you need to take a look at this program: PDF Index Generator.

    What’s good about it is the hyperlinking index entries, and the ability to use regular expressions to find the entries to index automatically inside the book.

    What’s not good about it is that it supports only pdf files, but doesn’t support epub books.

    Thank you for this wonderful series.