Hey, Listen! Books with Audio, Audiobooks, and Everything In-Between

  • Sumo
Four books with cloth spines — yellow, taupe, green, and blue — bookended by over-the-ear headphones.

This is a guest post from Wendy Reid. Wendy is an avid reader and listener who just so happens to work at the intersection of ebooks, audiobooks, and accessibility. She is a chair of the EPUB3 and Audiobooks working groups at the W3C, as well as the editor for the W3C Audiobooks specification. When she is not doing standards, she is leading digital accessibility initiatives at Rakuten Kobo. You can find her on Twitter at @wendy_a_reid.

I have often seen audiobooks, books with audio, and technologies like Text-to-Speech (TTS) lumped together as the same or similar things. This manifests in publisher concerns about audiobooks eating into book sales, or worries about TTS eating into audiobook sales. With the growing attention on accessibility and assistive technology in publishing, this has even extended into concerns about screen readers.

Before I get too deep into this, let’s establish some vocabulary:

  • Ebook: a primarily textual publication or collection of publications, which as a digital format can also contain media content like video, audio, or images
  • Audiobook: a primarily audio publication, where most of the content is exclusively audio, with the possibility of some text, images, and video as supplements
  • SMIL: also known as media overlays, Synchronized Multimedia Integration Language (SMIL) is used in EPUBs to add multimedia to text, most often audio
  • TTS: text-to-speech, a technology where text content can be read by a computer program on demand, using a synthesized voice
  • Screen Readers: a type of assistive technology that allows users who are blind, deaf blind, or have cognitive or learning disabilities to read and operate computer programs, apps, and web browsers

With these definitions, let’s look at the biggest concern many people have around all of these technologies: don’t they all replace each other? If I allow TTS on my ebook, isn’t that the same as letting someone have the audiobook for the cost of an ebook? What is the point of expensive audiobook production if I can just use TTS? Is TTS an acceptable substitute for making my reading application work with screen readers?

Short answer? No.

If I allow TTS on my ebook, isn’t that the same as letting someone have the audiobook for the cost of an ebook? What is the point of expensive audiobook production if I can just use TTS? Is TTS an acceptable substitute for making my reading application work with screen readers?
Short answer? No.

The important thing to think about when it comes to ebooks, audiobooks, and technologies like TTS or screen readers is that their users are going to use them for different reasons. Users may sometimes use technologies interchangeably, exclusively, or not at all. The reasons for what they use and how may relate to a number of factors, including preference, accessibility needs, budget, and purpose. Understanding what each technology can and cannot do is helpful as well.

Screen readers provide a very granular view into content, including allowing navigation by line, sentence, even word. Screen readers can spell out words for a user, or allow them to move around the text using objects like headings, images, or links. A screen reader will allow a user to interact with the alternative text for an image or figure. They will sometimes highlight or outline the block of text while reading (depending on settings or software). Screen reader users can also control the speed of the output. They also exclusively use computer-generated speech, and have some options for style of voice.

TTS (text to speech) is very similar to screen reader output in some ways, but has far fewer features. Users can choose a preferred voice in many cases, and the speed at which content is read. With the advent of more advanced speech generation engines, some of the computer generated voices for TTS can be almost natural sounding, and there are a wide range of options allowing a user to choose the gender of a voice and even it’s localization. Users cannot use TTS to navigate, provide spelling, or read the alternative text of an image or figure. TTS does highlight the text it is reading, either by line or by word.

Since we are talking about computer generated narration, it is fair to point out that this is a growing trend in audiobooks as well. As a production method it can be useful for content that a human may not enjoy reading (e.g. a glossary or copyright page), or to get an audiobook produced that would never be done with a human narrator due to cost or subject (i.e. a technical manual). However, the preference is still for human narration, and some users actively seek out preferred narrators or styles of narration, like cast productions.

Ebooks using SMIL or features like embedded audio can have similar experiences to TTS, but with some of the detail found in audiobooks. SMIL offers highlighting and a human narrator, but usually doesn’t have the option to change speed or navigate through the book.

All of this is to say that these features are not interchangeable. They are also, with the exception of audiobooks and SMIL, in the domain of the reading system.

These features are not interchangeable.

Reading Systems and Audio

Like most things ereading, reading systems (the applications used to open ebooks/audiobooks) are responsible for the bulk of the user experience. Reading system developers make decisions all of the time about what to support, what features to provide, and how to provide them. Depending on the platform, these decisions can be driven by content type, user requests, market pressure, platform constraints or experimentation.

TTS is a good example of a feature that lies directly in the realm of the reading system. A reading system can choose whether or not to build a TTS feature into their app or program. The main complication for whether a reading system builds the feature is the unique relationship reading systems have with publishers. Some publishers don’t want a feature like TTS in their books. Fear of possibly losing content, especially if the reading system is connected to a retail platform, can put a chill on plans to implement or widely release a feature like TTS. Alternately, in a climate where some publishers are ok with the feature but others may not be, creating rules within a system to block the feature for some content is technically easy. The problem for reading systems in that situation is trying to explain to the user why one book has TTS, but another does not.

Reading systems do not have as much control over screen readers, but poor design and bad coding can lead to issues ranging from complete inaccessibility to poor experiences. TTS can sometimes be seen as a stop gap until issues with screen readers are resolved, but shouldn’t be, since their use cases and capabilities are so different.

Reading systems are also highly interested in keeping and growing their user base. Adding features like TTS and screen reader support are ways to do that. Giving users more ways to enjoy their books is a winning strategy every time.

Enabling text and audio formats for content, both separately and together, is incredibly important to accessibility!


I would be remiss if I didn’t end this on a note about accessibility. Enabling text and audio formats for content, both separately and together, is incredibly important to accessibility!

Users read for a number of different reasons, but they also perform reading through a number of different affordances or methods. Giving users a number of different ways to interact and use your books makes them more accessible, usable, and desirable!

A screen reader user may use TTS if they are just looking for someone to read them a story, and the text is not so complex that they might want spellings or need to navigate tables. TTS is a popular feature with people who have reading disabilities like dyslexia, where listening to a book might be easier or they would like to read along with the audio to aid comprehension. TTS can also be helpful to people learning new languages, where they can see the word and hear an accurate pronunciation.

By not allowing users to experience the content the way they wish or need to, you’re essentially telling them: “This isn’t for you”. It feels like a very quick way to lose a customer, or to put someone off of reading completely. Digital reading brought the promise of more accessibility to users with disabilities, failing to leverage that by blocking off certain features or tools would be a disappointment.

Sound off! Or, in Conclusion

Ebooks with audio, Audiobooks, TTS, and Screen readers are all different things, interrelated but not substitutions for one another. Each one has its pros and cons for different users, and those pros and cons can depend on a number of factors. The takeaway is that it is not for us to decide how users read our books, and since our goal should always be to ensure that as many people as possible can enjoy them, we should make that happen.

Comments are closed.