Avoiding Generic HTML

  • Sumo

This is the first of a series of posts on accessibility in ebooks, focussing on how to use the tools we already have to level-up our ebooks. Like the advent of print publishing in the 16th century, the publishing world is at a precipice. In order to live up to the democratic promise of digital publishing, ebook developers need to pay closer attention to accessibility, particularly how their markup practices impact the reading experience for a broad cross-section of readers.

InDesign is an excellent digital publishing tool but it needs to be nudged on the regular to produce clean, semantic HTML. The EPUB export straight out of InDesign may look just fine at first glance in a reader, but under the hood it is full of overrides (even from cleanly-styled source files), overburdened HTML, and – the worst! – generic HTML.

When ebooks contain only generalized structures, reading systems are limited to presenting only the basic visual form of the book. Dumb data makes for dumb reading experiences, as reading systems cannot play the necessary role of facilitator when given little-to-nothing to work with. And that’s why not everyone can read all digital content. —from Matt Garrish’s Accessible EPUB 3 (O’Reilly)

The semantically meaningless <div> soup that comes out of InDesign will fail to present as expected in some reading environments, and will actively interfere with the reading experience for some users. And as Matt points out in Accessible EPUB 3, it is a violation of the EPUB 3 spec to use generic markup where more specific tags would be semantically richer. Finally, structurally meaningful content is agile, portable, and ready to adapt to new reading systems and/or reading systems that are using your ebook in unexpected ways, such as assistive technology.

Convinced? Okay, let’s get down to it then. How on earth can we avoid the <span>/<div> madness that InDesign will produce? I have some magic tricks for your workflow best practices.

Maybe you smart people already knew this, but I only learned this trick about two years ago and really wish I’d known about it much longer. The pull-down menus inside of “Edit all Export Tags” are somewhat limited. At the character style level, you can choose span, em, or strong.

InDesign users can choose one of those – but don’t choose any of them, they are almost never correct! I suggest that developers input their own tag fields there. Instead of opting for <em> for italicized content, consider typing “i” in that field to get semantically correct mark up.

<i>Italicize me!</i>

You can key in b, sup, cite, and small. You can spend a little effort to use separate character styles for content that should be marked up with <i> and <em> and <cite>, <b> and <strong>.  (They have different uses. If you aren’t clear on where/when, consider sharpening your understanding of the difference.)

Similarly, you can edit the export options at the object level to get cleaner, more meaningful HTML. Instead of choosing from the dreaded generic <span> or <div> tags, InDesign will allow for inputting more correct tags like <section>, <aside>, <figure>, or <figcaption>.

I would strongly encourage ebook developers for whom InDesign is an important tool to incorporate practices like this into their workflow. Bear in mind that this isn’t an opportunity to imagine new HTML tags; mind the common conventions that reading systems and assistive technology is built around.

The integrity of your content depends entirely on using the right element for the job.

9 Responses to “Avoiding Generic HTML”

  1. Maryse says:

    Oh! Thank you Laura! I will definitely, positively give these tricks a try! 😊

  2. Jaume Balmes says:

    ok, in my previous comment simply dissapears all the markup I want to show, let’s try some entities:
    Hi “<i>” isn’t a semantic tag itself (and of course not in other languages), needs some info attached as an atribute. For example in spanish only <i lang=”XXX”>text in XXX language</i> uses the “<i>” tag, because emphasys, voice, or quotations doesn’t use italic in general. I’m sure other languages with different typographic conventions have his own requirements with these less-semantic tags (like <i> and <b>).

  3. Laura Brady says:

    You are right, Jaume. What I was trying to say is that whether to use and “i” tag or an “em” tag, for example, is massively misunderstood and that most ebook developers would benefit from a refresher on where and when to use those tags.

  4. Excellent advice! I’m emailing Adobe InDesign engineers with a link to this.

  5. L Phillips says:

    Laura, I have been enjoying your posts and so appreciate your passion for and efforts with ID to “nudge it” into a “doing it right the first time” workflow. I am just an outsider, an interested person on the client side, and I applaud Anne Marie’s response of writing to the Adobe engineers. All of you here and in the Twitter group are well versed enough in this area of combined art & code to make a great lobbying effort with Adobe. Their evangelists are always going on about making their products better, surely they would listen to so many real-world professionals who are basically doing their trial & error for them. Now it’s time for them to incorporate time-tested, cleaner code into ID itself. I hope you will be inspired to start a call/write/petition…plead campaign! Together, you guys can really elevate the craft and help streamline your tools.

  6. Laura Brady says:

    I am pleased to hear you like these posts, Lynn. I do appreciate all the work that goes into building the software we use and how compromises need to be made. Which is why I spend time building workaround workflows.

  7. […] <figure>, <figcaption>, and <section> tags out of InDesign (explained in detail here). If you haven’t done that, edit those generic <div> tags to be more specific and […]

  8. […] ‘edit all export tags’ dialog should be familiar to us all (see this EPUB Secrets post on HTML 5, as well). But until I experimented with it for this article, I have to admit that I’d assumed […]

  9. […] will export generic HTML every time unless nudged, pushed, and bullied into doing better. An ebook built with accessibility as one of its priorities is built with more meaningful HTML 5 […]