IDPF ePUB 3.0 Indexes Working Group Charter
On January 31, 2012, the IDPF member groups approved the charter for a working group to add indexes support to ePUB. The EPUB 3 Indexes Charter has been released and is available on the IDPF Website and on code.google. You can also follow the ePUB Working Group here.
The ePUB 3 Indexes Charter
The charter is a document that outlines the need for indexes in ePUBs, the scope of the charter itself including integration constraints, the use cases, the needed publication properties, and reading system behaviors with indexes. I will take a look at each of these areas of the charter in this post.
Need for this proposal
Indexes were not specifically covered in the final ePUB 3.0 spec. The charter seeks to add a module to ePUB 3.0 that will provide a specification for indexes in ePUBs, while also addressing Item 6 on enhanced navigation support in the ePUB Revision Working Group Charter. While indexes in print books are largely a simple navigational tool that allows one way navigation from an item in the index to the page where information on the item resides, to the IDPF’s credit they are looking to leverage the digital nature of ePUBs and the power of linking to both content inside an ePUB and outside an ePUB in the charter. The charter clearly draws lines between indexes and search when it states, “Indexes are specialized navigational and supplemental information tools that offer readers an interaction with content that is enhanced, more powerful, and more specific than simple search.” The expectation that indexes in ePUBs should far surpass search is clear in the way that the charter defines how indexes are use:
Readers use indexes in a variety of ways: to quickly locate discussions in content, to discover relevant content that is discussed with differing synonyms, to discover new terminology for concepts, and to see details of topics covered in an eBook. Indexes convey a sense of the depth of topic coverage in an eBook, break down large concepts into important subcategories, and allow exploration of content through granular and user-friendly access points. Indexes provide the added value of human analysis, enabling an interactive conversation between the reader and the book. Indexers are not constrained to use as entries the terms used by the author, or even in some cases only the terms that appear in the entire document: indexers are focused on meanings, not just words. Indexes are also a pre-coordinate search system, as opposed to search’s propensity to being post-coordinate.
This last point is important and worth examining. Coordination means the building or assembling of (usually subject) search terms. In a precoordinate system, the record or document creator designates the search terms from an already authoritative source. Library of Congress Subject Headings (LCSH) is one such source. An important distinguishing characteristic is that concepts are usually represented by a single term drawn from the authoritative source. In a postcoordinate system, the information creator may create lists of terms or in the case of full-text presentation may allow the text itself to generate search terms. Multiple terms representing concepts may be acceptable. It is postcoordinate because the end user selects search terms.
Indexes leverage the expertise of the indexer as both a subject matter expert and as an organizer of similar ideas under a single term, which may be synonymous with the actual text that appears in a book. In a way, an index uses an authoritative list of terms to locate everywhere in a book that a subject/topic is written about.
Scope
In-scope (Deliverables)
The scope of the charter defines three top-level functional properties that should be available in ePUB indexes:
- Allow users to read or browse an index in full chapter-like format
- Allow users to quickly access index information in a search context
- Allow users to see index entries associated with a range of text
The first property is a mirror of what readers have in a print index. The second property combines the power of an index with search. The third is a new animal: a reverse index (more on that below).
Out of Scope
The charter also includes a number of items out of its scope. These include how to use any eBook creation tools to implement indexing, the order of main heads and subheads (in other words, there is no authoritative source provided by the IDPF; that is left to the discretion of the indexer/publisher), suggested presentation formats of the index, and system-oriented functionality for fast lookup, reverse lookup, and retrieval (typically described in terms of a database-like file). In a way, this is IDPF leaving is up to publishers and in some cases reading platforms to determine how this list is handled, except there are integration constraints.
Integration Constraints
Publishers and ePUB3 reading platforms can handle the out of scope elements as they see fit, except that indexes must integrate with ePUB 3 by including the following:
- Graceful fallback: it must allow EPUB 3 Reading Systems to open and reasonably render Publications containing the mechanism, even if the Reading System has not been updated to explicitly support the mechanism.
- Native grammars and extension points: it must utilize EPUB 3 Content Document grammars to the maximum extent possible, and it must only use extension points defined within EPUB 3 and XHTML 5.
- Shallow implementation: Reading System implementation of the mechanism must not require changes to underlying (browser-based or other) XHTML rendering engines; full implementations must be possible on the Reading System level alone.
This basically means that one can not create indexes in an ePUB with a mechanism that will break the ePUB on readers that don’t have specific support for that mechanism, one must utilize the existing standards of ePUB 3 and XHTML5 with respect to Content Document grammars and extension points, and the index needs to work with existing XHTML rendering engines and be able to run fully on the Reading System/Reader.
Use Cases
The charter includes four primary use cases for indexes. They are
- Chapter-like index
- Pop-up index
- Reverse index
- Standalone index
Each of these use cases exhibits a unique behavior. The inclusion of all four shows that the IDPF working group is forward-thinking this proposed index module.
Chapter-like index
The chapter-like index is essentially the ePUB implementation of the print book index with live links, with a few extra tricks. A user must be able to navigate to the chapter-like index so they can browse topics and find information. Much like the navigational TOC, a user should be able to expand and collapse main headings/subheadings. In ePUB 2.0.1, there really was not a way to do this. If special symbols, prefixes or suffixes (daggers, ff, n, nn and so on) are used to annotate locations, they can be selected to display the meaning of the symbol.
As one would expect, a user clicks on index links to navigate to the book’s content. The charter also calls for contextual information (three to four words from each side of the target location in the text) to display if a user hovers over the index link. The charter also calls for the ability to click cross references to navigate to the target heading or to view a list of target headings within the index.
Pop-up index
The pop-up index case is a new function that finally takes advantage of the digital nature of ePUBs and probably the support of JavaScript. I know some eBook developers have been working on pop-up footnotes, but the IDPF working group shows again that it is forward think with the inclusion of pop-up indexes in the charter. The pop-up index is envisioned to have the following qualities: A user will be able to select a term or phrase in the text of an ePUB and trigger a pop-up view of the index displaying the first matching main heading.
The user will also be able to open the pop-up index from within the book’s content without selecting anything; the index will display at the top of pop-up index or at the last-used position in index. With the pop-up index displayed, a user will be able to browse for terms in the index or enter search terms; entering search terms will trigger stemming and auto-fill. As in the chapter index, the user will be able to expand or collapse main headings/subheadings in the index. Also like the chapter index, hovering over an index link will display contextual information about the entry, that is three or four word on each side of the target location in the ePUB text. By clicking a link in the pop-up index, a user will be taken to that target location within the ePUB content.
Reverse index
The reverse index essentially will allow users to selected a range of text (a paragraph) in an ePUB and see a pop-up list of all in-context index entries. If a paragraph is selected, a user will be able to see how all indexed content within that paragraph is indexed. A user will be able to select any of this indexed content and access a selected entry in the pop-up index. This would allow a user to quickly locate other locations in the ePUB where the content is included or discussed. One issue that is not addressed specifically is whether a single paragraph ID tag will be used for content within a paragraph or whether individual words or phrases within a paragraph will need their own ID tags.
Standalone index
The standalone index is another area where the IDPF working group is pushing the envelope. The standalone index will actually be an ePUB that consists of one or more master indexes that contain links to other ePUBs. Like a normal index and the chapter index, a user will be able to browse topics and find information and expand or collapse main heading/subheadings. The difference is that when a user clicks index links, they will be able to navigate to other ePUBs. How this will work is not defined, but the ability to link actual content from one ePUB to another has been something that some eBook developers have been dreaming about and trying to figure out for some time. It will be interesting to see how this will be accomplished.
Needed Publication Properties
In order for the indexes to work as outlined in the use cases, the ePUB will need certain publication properties. These are grouped into three buckets: package metadata, index links, and index presentation.
Package metadata
The package metadata is information that will be in the .OPF file that will identify indexes, information that will help in the navigation of indexes, and information that identifies what is in the index. First, there must be a way to declare an index that is contained in an ePUB via the package metadata (in the .OPF file).
A reverse index should contain the same information as a chapter-like index but the information should be sorted in locator order (the order the locators appear in the ePUB) not in alphabetical order like a chapter-like index. The presence of a reverse index must also be declared in the package metadata.
A publication that contains one or more standalone indexes must declare that is has standalone index in the package metadata.
An index should contain “group break navigation data.” This is the navigation data used in the floating or persistent navigation feature of a chapter-like index. The group break navigation data must be declared in the package metadata.
If indexes contain special symbols, prefixes or suffixes (daggers, ff, n, nn and so on) the index-symbols list (legend) must be machine-discoverable. It must also be declared in the package metadata.
Finally, if an ePUB has semantic markup to display lists of related main headings for generic cross references, this capability must be declared in the package metadata.
With the increased functions associated with this new index module in ePUB 3.0, it will be important to declare that the index(es) are present, how they function, what content related to the index is included, and how it should function.
Index links
Indexes in ePUBs work mainly through links. The index charter states that links, “can identify single locations, multiple locations, or ranges for lengthy subject coverage in the publication’s main content.” This is not dissimilar to how a print index works.
The group break navigation data should provide links to symbol, number, or letter group breaks within an index. This is basically a way to navigate to a main head in an index, for instance, all items the alphabetically begin with the letter G.
Generic cross-reference links in an index can identify and display the related semantically-marked main headings as a list. This should allow the ability to see into nest content (head structures; index heads) and display them as a list that can be navigated by the user.
An index should contain targets for the navigational system’s links within the index. This means there must be IDs or other anchors that links can locate and take the user to. For instance, cross references within the index will have targets within the index to navigate to (see also, for example). The same will be true for group breaks for letter sections (the individual letters that break up content alphabetically in a print index, for instance).
Index presentation
The charter includes some information on index presentation, but it focuses on what must be present and display, not how anything should display. Master indexes that index multiple volumes must include links to targets in other ePUBs. This is the point of the master indexes/standalone indexes, so it seems basic.
Indexes must have unique characters and numbers that act as group breaks for letter section in the index present and marked in a machine-discoverable form. If they are going to be used as navigation and the target of links, this also seems basic.
If there are any headnotes present in the index they should be marked as such and should be presented at the beginning of the chapter-like index. Any in-line editor’s notes should be displayed in chapter-like and pop-up indexes.
Proper text alignment and indentation should be maintained in the chapter-like index. In other words, the layout should reflect the conventions that one would see in a print books. Main heads should be flush left in the column. The first level of subhead should be indented. Each additional level of subhead should have a further. If any special formatting of the index’s content (italics, bold, sub-, super-script, fonts, special characters) [this is what the charter says] should be preserved in the index’s content. One would also hope that any special formatting of the content included in the rest of the ePUB would also be preserved.
If decorations, prefixes or suffixes are used in the index to annotate locators (daggers, ff, n, nn and so on), they must be marked up as such and machine-discoverable. A legend containing definitions or explanations of each decoration, prefix or suffix, if available, defines the description for each symbol. These decorations will be used as targets for links, so a reading system must be able to find them. And as a courtesy to readers, you should always include a legend for all symbols used in an index.
Reading System Behaviors
The reading system behaviors, that is how a ePUB reader device or ePUB reader software acts with the index, mirrors the use cases defined earlier. These behaviors are broken into five sections: implied/assumed, chapter-like index, pop-up index, reverse index, and standalone index. The charter does contain this caveat: Note: the intent of this project is not to mandate reading system behaviors. The list below only serves the purpose of illustrating Reading System/Index interactions. So the IDPF is not mandating any behaviors from eReader manufacturers, they are just including a list of behaviors the readers be capable of it, you know, you want to support the index module.
I do not think it is worth explaining each of these individually, so I will just list them here:
Implied/assumed (existing functionality in EPUB readers that indexes will use)
- Reading system will properly display text encoded with special formatting, i.e. bold, italic, subscript, superscript.
- Reading system must be able to discover whether an EPUB contains one or more indexes.
- Reading system must be able to discover whether an EPUB consists of one or more standalone indexes.
- Reading system will properly display text encoded as a link, i.e. as text that can be hovered over or clicked to trigger an action (taking user to target, displaying contextual phrase, etc.)
- Reading system includes buttons or menu options to access either the chapter-like index or the pop-up directly from the text, without having to visit the table of contents.
- Reading system allows the reader to select collapsed or expanded views of the index levels (main headings only, main and subheadings, etc.).
- Reading system determines how targeted location is displayed on the screen after its link has been clicked (top of screen, middle of screen, highlighted term, highlighted range of text, blinking symbol or indicator of location, etc.)
- Reading system displays a legend for special symbols used in the index’s locator decorations in a pop-up.
Chapter-like index
- Reading system displays chapter-like index as normal pages.
- Reading system displays a floating or persistent set of group break navigational links in the chapter-like index to allow navigation to other sections of the index.
- Reading system displays floating or persistent access to headnotes.
- Reading system persistently displays applicable parent entry(ies) as user scrolls through lower-level entries, if applicable/necessary.
Pop-up index
- Reading system displays pop-up index as separate window, automatically scrolled to the term selected when it was activated (or defaults to top of index if nothing was selected)
- Reading system provides search functionality within pop-up index.
- Reading system displays floating or persistent access to headnotes.
- Reading system persistently displays applicable parent entry(ies) as user scrolls through lower-level entries, if applicable/necessary.
Reverse index
- Reading system must be able to uniquely identify multiple index targets in a selected section of text (e.g., a paragraph).
- Reading system must be able to extend that identification to include index targets whose range encompasses the selected text (e.g. a range that begins prior to the selected text and ends after the selected text)
- Reading system must be able to locate the main headings in the index associated with each of those anchors.
- Reading system must be able to display those main headings to the user.
- Reading system must be able to render each main heading as a live link to the heading’s location in the chapter-like index.
Standalone index
- Reading system must be able to link from one EPUB to another, and have a return mechanism.
The IDPF Working Group Indexes charter is an ambitious document. It appears that the group is attempting to push ePUB indexes well beyond a simple replication of the print index with live links to the content towards a way of identifying content that will allow readers to more easily find information they are looking for within the ePUB they are reading or in other ePUBs. It will be interesting to see how it is applied and how it affects the reading experience of users.
What do you think of the information that is included in the index charter? Do you think it will help readers find information within ePUBs? Do you have any reservations to the direction the IDPF working group is moving?
[…] this podcast > Indicies or Indexes? Actually, both are correct > EPUB Indexes: Overview from EPUBSecrets.com, Ben Milander’s Index for EPUB script, DTPTool’s CrossReferencesPRO > Devon ThinkPro […]
Thank you for some other informative blog.
Where else mmay just I get that type of information written in suuch
a perfect manner? I have a project that I am just now
working on, and I’ve beden aat the look out
for such info.
You mɑde some reallү good ρoints therе.
І checked on thе internet tߋ learn more
ɑbout thе issue аnd fօund most people will go along with your views on this site.
Great article.
my wife required IRS 1120 last year and was informed of an excellent service that has lots of sample forms . If others are searching for IRS 1120 also , here’s http://goo.gl/Mrud26