Why and How to Convert to EPUB 3

Pixelated text reading "Level Up", with an arrow on the first ascender in the "u" in up.
  • Sumo

This guest post is by Nicole Lambe, an ebook devloper at House of Anansi Press in Toronto. She is a graduate of Humber College Creative Book Publishing Program, and a passionate accessibility advocate. She is on Twitter at @xnicolelambe.


In today’s world, one thing we can all agree on is that technology is ever changing. With this, a need for adaptation to occur is inevitable. What once worked for e-readers around the world has changed as we move towards more accessible and user-friendly content. Many older ebooks and even currently rendered ebooks are only available in the out-dated and inaccessible form of EPUB 2. If you need help understanding why EPUB 3, see: http://epubsecrets.com/epub-2-or-epub-3-that-is-not-a-question.php

Why Convert?

There are myriad benefits to moving content from EPUB 2 to EPUB 3, particularly from an accessibility point of view.

  1. Users are able to view the content more easily, making the over all reader experience more enjoyable, easily navigable and less of a hassle. When a reader has a positive experience with an ebook, they are more likely to continue reading digitally. Therefore, increasing sales.
  2. HTML markup is easier for screen-readers, and other accessibility features to discern for it’s true meaning. In Canada alone, there are over 3 million individuals with print-disabilities. By making some simple adjustments to our content, we are able to tap into this market and help create more inclusive content.
  3. It is much easier to go back and edit the cleaned up EPUB 3 HTML and CSS code in the future if need be for other projects. Although this can initially be tedious, it is good for future endeavors if everything is cleaner, organized and has meaningful markup.

EPUB 3 is, simply put, a more agile, cleaner, more accessible format

Exporting to EPUB 3

When starting the task of remediating an EPUB file from 2 to 3, the first thing to do is to convert it over into the bare bones of an EPUB 3.  This can be done in a number of ways, but the easiest, and most efficient way to do so is through the Epub3-itizer plugin that can be installed on Sigil.

A view of the Sigil interface, highlighting the ePub3-itizer plugin.

Once this is exported into EPUB 3, you are able to get the basic outline of how the file will start to look. The content in the new EPUB 3 file will include more workable navigation (using the toc or nav.xhtml rather than from the toc.ncx); HTML5 and CSS3 capabilities; more useful and sophisticated metadata on accessibility, titles, and authors, etc.; and the ability to insert audio and video. Doing this conversion is just the beginning, but it is already opens up a plethora of possibilities.

After conversion I like to immediately go and rename all the files in Sigil, changing them to have meaning and making sure all they all use the extension .xhtml, not .html. (i.e. Chapter-1.xhtml, Cover.xhtml, etc.). Doing this helps with keeping files organized and better serves you when editing the code afterwards.

CSS Bloating

After exporting, I go into the CSS file in my preferred coding software, I use Oxygen XML Editor, and try to combat an overly expansive CSS. These elements can be exhausting, but nothing a little RegEx and TLC can’t fix. More often than not, CSS files are a result of what has been exported from the older print Indesign file and have certain nuances that can be excessive to the ebook. Cleaning up CSS bloat is one of the most beneficial parts of conversion because it lays the foundations for the rest of the file.

Generally, these are what I look to eliminate from the CSS:

  • Widows and orphans
  • Page-breaks
  • Any text specifications with the value of “none”
  • Font-sizes that are for general text (i.e. paragraph text, copyright, etc.)
  • Any colour indications that wouldn’t read well on different screen settings (i.e. black, light blue, etc.)
  • Any overrides (paragraph or character)
  • Classes for <span> tag italics or small caps

And I look to change these things:

  • Anything that is in pixels, I convert to ems
  • If there is only one font specified, I add other common e-reader fonts as a means of catering to different devices
  • Simplify margin tags to “margin: 0 0 0 0;” rather then explicitly formatting all 4 margins
Nineteen lines of CSS is pared down to eight critical pieces.

HTML Mark Up

Once the CSS file is cleaned up I begin to work through my entire array of files. What is most important to the entire job of conversion is to change is the meaning of the HTML markup. When converted out of EPUB 2,  almost everything is contained in <div> tags, and many headers are reduced to regular <p> tags. While this might do the job of presenting the book in a way that we find acceptable, it Is important to make markup meaningful as a means to better signal to assistive technologies exactly what the book is trying to convey. Proper structural hierarchy and markup is key to telling what is what. Without it, chapter names can read as sentences, italics as regular words without emphasis, and etc.

I use the following HTML 5 tags to help combat this:

<section> — for body text and chapters, generally in place of a <div> tag on an entire chapter.

<aside> — for sidebars, used in place of <p> tags.

<figure> — for images, used in place of <div> tags that contain an image.

<figcaption> — for image captions, used within the <figure> tags.

<blockquote> — for larger quotes that are not in paragraphs, but on their own.

Although these character-level fixes may seem obvious, they occur in almost every file I encounter:

<italic> or <cite> <span class=“italic”> or something similar confuses a screen-reader and does not convey that the word is italicized or cited. Using the proper tag is essential and more semantically meaningul. I also make sure the language tags associated with italicized phrases are included here if the italicized word/phrase is in another language

<small> — small caps rendered as such: <span class=“small-caps”>, a better solution would be to use the made-for-use small tag, using CSS to in order to make sure the tag capitalizes the text. Blitz CSS is a terrific source of well tested CSS solutions.

I also make sure the HTML language declarations are only on the top HTML tag, and not on body tags throughout the text. The only other place I use lang tags is for language shifts in the text, so on <italic>, <cite> or <span> tags for words/phrases that are in other languages.

Before: unremediated HTML including the header
After: cleaned-up HTML showing HTML 5 implementation and correct header.

Landmarks

Including Landmarks in the Navigation is essential as it helps centralize and locate things that are not necessarily included in the Table of Contents. By expanding the location of where we can find different things, devices can better read what is going on within a publication. For example, if the device can pinpoint the start of content through the landmarks, the EPUB will open there instead of to a cover. In the Landmarks, I include, but don’t limit to:

  • Front matter (Cover, Title Page, etc.)
  • Body matter (Start of Content)
  • Back matter (Acknowledgments, About the Author, etc.)
  • TOC
  • List of Illustrations (or other media)
  • Bibliography
  • Index
  • Glossary

Accessibility Metadata

Having accessibility metadata is extremely important as it signifies to screen-readers and accessibility devices what is going on in a book and warns for different issues that may arise.

The full suite of accessibility metadata.

As you can see in this photo, there is a lot going on. Each individual line is indication to what the text does. For example, it says in the photo above that there will be both textual and visual content, there is structural navigation, photos will have alternative text and ARIA roles are used to signify different sections. Then it goes on to how it can be used— with a keyboard, a mouse, and a touch screen. After this, it indicates that there is no content that will flash, make noise or anything motion simulated. This is so that people can know if it may be triggering or hazardous towards any conditions that the reader may have, such as epilepsy. And finally, the summary statement can adhere to what standards the EPUB conforms to. Overall, providing a very useful set of metadata for all sorts of readers. See the Daisy Knowledge Base for a thorough explanation of every piece of this metadata.

Conclusion

Converting to EPUB 3 may seem like an mundane, slightly irritating task, but who isn’t in favour of a better user experience, and publishing more inclusively? With technology in a constant evolution, it is a no-brainer to try and keep our content up to date. Publishers get books that are easier to update, readers get ebooks that are easier to use — a win-win!

Comments are closed.