From Inside the EPUB Ingestion Factory
This is a guest post from Ben Dugas at Kobo. Ben is the Manager, Content Display Quality and Production at Rakuten Kobo Inc. in Toronto.
A few times a year I’ll find myself in a conversation with Laura Brady or some other expert in the world of #eprdctn and I’ll attempt to explain how different the world of EPUB/ebook support looks when you review content and identify support gaps at a place like Kobo. Laura asked me to give this a go in blog form so here goes.
EPUB 2 still makes up roughly 70% of all incoming content
I thought the best way to go about this was by providing some insight into the content we ingest and I’ll start with the split of EPUB 2/EPUB 3 and reflowable/fixed layout. It may come as a surprise to some that EPUB 2 still makes up roughly 70% of all incoming content. Here’s the breakdown of what we’ve received so far in 2017:
EPUB 3: 16%
EPUB 3: Fixed Layout 15% (half of which are image only and contain no HTML, mainly comics)
EPUB 3: Open Manga Format 1%
EPUB 2: Fixed Layout <1%
EPUB 3 has climbed steadily over time and EPUB 2 Fixed Layout (which was never officially in the spec) is slowly marching towards 0% but these numbers are close to what we would have seen last year and close to what we’ll likely see next year.
When I scan to see what’s inside those EPUBs (aside from text and images) the results also show us that experimentation is the exception as opposed to the norm. Here’s the percent of incoming content that raises each of the flags we have to indicate that there’s more than just text and images in the EPUB (note that I have to go three decimals deep to give any indication of variance between some of these):
Embedded fonts (OTF, TTF and WOFF): 14.751%
Right to Left Reading: 10.381%
Embedded Audio: 0.103%
Read Along/SMIL: 0.027%
Embedded Video: 0.014%
Lastly I checked to see how many of our EPUBs have no enhancements at all and it came to:
73.7% if you define enhancements as “Fixed Layout, Right-to-Left reading, embedded fonts or any of the elements I’ve listed above” or
99.5% if you define enhancements as “any of the elements listed above but do not count Fixed Layout, Right-to-Left reading or embedded fonts.”
Now if you are someone who spends your days making EPUBs or writing specifications here’s where I’ll take a positive turn to prevent you from feeling bad about the general state of things with EPUB. The conclusions I draw from this are as follows:
- EPUB 3 has not been adopted by the majority of content creators and distributors. That’s not to say that EPUB 3 hasn’t been successful (it’s expanded on functionality, allows for a greater variety of content to be produced and distributed) but EPUB 2 is still a viable container for text and images and that still accounts for most EPUBs being produced today.
- All the experimentation is found in a very small minority of the content.
- Making a business case for increased support for enhancements is difficult to do when the basis is the content we’re already receiving.
- EPUBs that make full use of the capabilities outlined in the spec either do not exist in great volume or are not being sent to Kobo.
This may sound obvious to some but there is no magical ‘EPUB support’ switch at Kobo we can flip whenever the spec is updated
This may sound obvious to some but there is no magical “EPUB support” switch at Kobo we can flip whenever the spec is updated. Sometimes we get things for free (ex. a platform or display engine it’s built on already supports something we didn’t have to build ourselves) but more often fixing or building anything requires that we:
- Identify the need for new functionality and outline the work required
- Line up content we can test with to ensure we’ve built the thing we needed to build
- Estimate and plan the work into our development roadmap
- Design (when adding new features)
- Execute on the development required
- Test the latest release and
- Release it
Naturally when work requires resources it’s going to need to be weighed against other work competing for the same resources. When we propose support improvements for CSS, EPUB for Education, audio/video or interactivity it’s likely being measured against something unrelated to content support such as 1) improving library management for users with large accounts 2) adding instant previews on web or 3) updating the walkthrough for first time users.
It’s not unreasonably difficult to justify spending resources on content support but it can’t be done without a concrete estimate on the potential value support improvements will bring. Will this make a difference to readers? How many titles will benefit from it? Will this content sell? How many different providers are using it? Is it going to become a commonly used element or will we see it in one batch of titles and never again? Is it in the EPUB spec? When we come across a support issue once we log it but it’s only when we can answer a few of those questions that we’re able to commit to having development work done.
On that note here’s what you as a content creator, distributor or specifications expert can do if you want to ensure Kobo’s development roadmap is aligned with current and emerging standards:
- If you find an instance of a valid EPUB not behaving as expected on one of Kobo’s apps or devices tell us and send in the file or a sample version of it. If the issue isn’t obvious send a detailed description and screenshot as well.
- If possible make sure it’s an EPUB that’s already loaded in the Kobo store. This way we can eliminate the issue being the result of any side-loading quirks and it’s easier to log the issue and share with our client teams.
- Tell us how many titles you know of that will be trigger the same issue. We don’t expect an account of how many times similar EPUBs will ever be sent but we’ll add whatever estimate you can provide to those we get from other parties.
- Give us a sense of the degree to which this content will be supported from a sales perspective. We don’t need hard evidence that development costs will be immediately offset by the sales of new content but if one or multiple parties are invested in selling content that requires support we don’t currently have it certainly helps us prioritize the work.
That’s not to say it’s up to content creators alone to outline our support gaps. We always have projects on the go to identify issues and maintain/improve our content support but if you want to engage with an entity like Kobo that’s the input template that puts us in a position to bridge the gap between the content we support and the content being produced.
You can connect with the content display quality team at Kobo at email@example.com. Feedback is always welcome.