Avoiding Generic HTML
This is the first of a series of posts on accessibility in ebooks, focussing on how to use the tools we already have to level-up our ebooks. Like the advent of print publishing in the 16th century, the publishing world is at a precipice. In order to live up to the democratic promise of digital publishing, ebook developers need to pay closer attention to accessibility, particularly how their markup practices impact the reading experience for a broad cross-section of readers.
InDesign is an excellent digital publishing tool but it needs to be nudged on the regular to produce clean, semantic HTML. The EPUB export straight out of InDesign may look just fine at first glance in a reader, but under the hood it is full of overrides (even from cleanly-styled source files), overburdened HTML, and – the worst! – generic HTML.
When ebooks contain only generalized structures, reading systems are limited to presenting only the basic visual form of the book. Dumb data makes for dumb reading experiences, as reading systems cannot play the necessary role of facilitator when given little-to-nothing to work with. And that’s why not everyone can read all digital content. —from Matt Garrish’s Accessible EPUB 3 (O’Reilly)
The semantically meaningless <div> soup that comes out of InDesign will fail to present as expected in some reading environments, and will actively interfere with the reading experience for some users. And as Matt points out in Accessible EPUB 3, it is a violation of the EPUB 3 spec to use generic markup where more specific tags would be semantically richer. Finally, structurally meaningful content is agile, portable, and ready to adapt to new reading systems and/or reading systems that are using your ebook in unexpected ways, such as assistive technology.
Convinced? Okay, let’s get down to it then. How on earth can we avoid the <span>/<div> madness that InDesign will produce? I have some magic tricks for your workflow best practices.
Maybe you smart people already knew this, but I only learned this trick about two years ago and really wish I’d known about it much longer. The pull-down menus inside of “Edit all Export Tags” are somewhat limited. At the character style level, you can choose span, em, or strong.
InDesign users can choose one of those – but don’t choose any of them, they are almost never correct! I suggest that developers input their own tag fields there. Instead of opting for <em> for italicized content, consider typing “i” in that field to get semantically correct mark up.
You can key in b, sup, cite, and small. You can spend a little effort to use separate character styles for content that should be marked up with <i> and <em> and <cite>, <b> and <strong>. (They have different uses. If you aren’t clear on where/when, consider sharpening your understanding of the difference.)
Similarly, you can edit the export options at the object level to get cleaner, more meaningful HTML. Instead of choosing from the dreaded generic <span> or <div> tags, InDesign will allow for inputting more correct tags like <section>, <aside>, <figure>, or <figcaption>.
I would strongly encourage ebook developers for whom InDesign is an important tool to incorporate practices like this into their workflow. Bear in mind that this isn’t an opportunity to imagine new HTML tags; mind the common conventions that reading systems and assistive technology is built around.
The integrity of your content depends entirely on using the right element for the job.