EPUB 3.3 is here!
This is a guest post from Wendy Reid. Wendy is an avid reader and listener who just so happens to work at the intersection of ebooks, audiobooks, and accessibility. She is a chair of the EPUB3 and Audiobooks working groups at the W3C, as well as the editor for the W3C Audiobooks specification. When she is not doing standards, she is leading digital accessibility initiatives at Rakuten Kobo. You can find her on Twitter at @wendy_a_reid.
You might have seen the news that a new version of EPUB is here, and depending on your relationship with the format, the news could be exciting or stress-inducing. Why is there even a new version? What is this new version of EPUB, what has changed, and what should you be concerned about?
Why is there a new version of EPUB 3?
First, let’s take a quick detour into the why. EPUB as a format is over 20 years old, and EPUB 3 itself is over 11 years old. EPUB 3 was initially developed by an organization called the International Digital Publishing Forum (IDPF), who had also developed EPUB2. A few years ago, IDPF agreed to merge with the World Wide Consortium (W3C), a standards body that developed standards for the web. It was a good fit, EPUB was a format built on web technologies like HTML and CSS, and this would give publishing direct access to those groups for better collaboration.
Shortly after the merger, the EPUB Community Group published EPUB 3.2, a revision of EPUB 3.0.1 that brought the specifications up to date and more in line with how the W3C does things. However, community groups can’t publish specifications (or in W3C parlance: recommendations), so in 2020 we formed the EPUB 3 Working Group to take the revised version of EPUB 3.2 and take it through the “Recommendation Process”.
The Recommendation process is long and a bit convoluted, but to summarize we’ve spent the last 2 years working on editing and improving the EPUB 3 specification to meet the requirements of a W3C recommendation, but also to address some long-standing issues with the specification itself. The one part of the process that is worth mentioning is what the W3C calls horizontal review, a step all recommendation-track documents must go through. This involves asking several other groups in the W3C to review our specification, and each group has an area of focus. The areas all recommendations are reviewed for are Accessibility, Privacy, Security, Internationalization, and overall architectural fit with the web. These reviews had a significant impact on the latest version of EPUB 3.
A brief note
Before we get too far into this topic, I want to be really clear about one major point:
EPUB 3.3 is completely backwards compatible with previous versions of EPUB 3. In fact, if you are currently producing EPUB 3 files that pass EPUBCheck and other validation, you are likely already producing EPUB 3.3 files.
If you are worried about the impact this release has on your current publishing processes, take a deep breath, everything is ok!
Now that our heart rates are back to normal, let’s get into the details.
What has changed?
New Document Structure
EPUB 3.3 has made a number of changes to previous versions of EPUB 3. Most of these changes are editorial, we put a big focus on clarifying the language in the specification and attempting to remove some of the contradictions or confusing areas of the documents.
The biggest editorial change you’ll notice if you look at the documents is that we’ve completely restructured them. In EPUB 3.2, there were five different documents:
- EPUB 3.2
- EPUB Packages 3.2
- EPUB Content Documents 3.2
- EPUB Open Container Format 3.2
- EPUB Media Overlays 3.2
If you were a content creator or reading system developer, you needed to read all of these to get a full picture of what you needed to do, and you likely needed to hop around between documents to understand the full picture.
In 3.3, we have reorganized the requirements into two documents:
- EPUB 3.3
- EPUB 3.3 Reading Systems
EPUB 3.3 (or “Core” as we affectionately call it), focuses on requirements for EPUB creators. It combines all of the requirements in the 5 documents relating to how content is assembled and organized into one document. EPUB 3.3 Reading Systems does the same thing, but for all requirements relating to the processing and presentation of EPUB content.
New Content Types
In EPUB 3.3 we have added two new content types (file formats):
- WebP, a modern image format for the web, which allows for smaller file sizes with the same quality as JPEG or PNG
- OPUS, an open source audio codec also designed for the web, with streaming, storage, and a wide range of support for different bitrates and sampling rates
Adding these means that content creators are welcome to use them. EPUB’s approach to file formats is to add them once we are confident of general support for the format. We waited on these two in particular because of limited support until recently. If you are not familiar with WebP or OPUS, both formats are optimized for size and quality, particularly for making websites load faster. If file size is a major concern for you in EPUB, it is work looking into using these formats, though we still recommend testing.
Privacy and Security
One of the major benefits of the W3C horizontal review process is the chance to have your work reviewed by experts in the areas included in the review. While EPUB is well known for it’s accessibility and support for international languages, we’ve lagged in other areas, mostly because of a lack of access to experts. The main area that EPUB has been weak in is privacy and security.
We have now added specific sections to both documents regarding privacy and security considerations for EPUB 3. These sections contain recommendations for both content creators and reading systems to help improve the overall security and privacy of EPUB files.
Since this section is new, I do recommend people review it, it is in both EPUB 3.3 and EPUB 3.3 Reading Systems. The information there is quite comprehensive, but let’s discuss some highlights here.
We performed a threat model analysis of EPUB as part of the work to write these sections. A threat model explores all of the possible ways an attacker can exploit security or privacy issues within a file or piece of software. For EPUB, we identified a few areas of particular concern:
- Scripting
- Compromised or malicious remote resources
- Phishing/spoofing
- Collection of user data
- User-generated content
Scripting broadly refers to the use of Javascript in EPUB files. Javascript is a programming langauge commonly used on the web to build more interactive experiences, and it has many uses in EPUB, from building interactivity to animations. It can create a security issues if used improperly, particularly through collecting user information or activating features in browsers or reading systems like geolocation or your microphone/video.
Compromised or malicious remote resources refer to files that live outside the EPUB that might be referenced within it. These can pose a security threat in a few different ways, either because the content itself is malicious (ex. a file that contains a virus), or the source is compromised (ex. the URL is taken over and malicious content is placed on it).
Phishing/Spoofing refers to two methods for tricking users into providing personal information. You’ve likely heard of phishing schemes, or seen them in your email or text messages. Spoofing is similar, where someone pretends to be an entity you trust. For EPUB, this could look like ebooks containing pages that ask for information pretending to be entities like the publisher or retailer you purchased it from.
Collection of User Data is a major privacy concern, and for EPUB, a challenging area. User data collection is central to many businesses on the web, but for ebooks in particular, some data is important to know. Bookmarking, book progress, recommendations, these all rely on information about a user’s reading habits. We recommend that content creators avoid including anything requesting user data unless it’s essential or clear what the data is needed for, and for reading systems to inform the user of data usage, purpose, and provide opt-outs where possible.
User-generated Content is another area of concern. Some EPUBs have the ability for users to fill in fields or enter text, like in a quiz book or educational content. This presents a security threat, as these text-entry fields or file-upload options can open the reading system or device to malicious code or files.
Security and Privacy is a publishing industry concern that is not just isolated to EPUB files, but also the ecosystem they operate in, including distributors, publishers, retailers, and reading systems. We can all work together to make publishing and EPUB safe, secure, and protective of user privacy!
What happens now?
The EPUB 3 Working Group is now asking members of the community to help us with testing the EPUB specification. In the recommendation process, tests are used to confirm whether the assertions made in a specification, any time we use language like MUST or SHOULD, is actually implemented in the real world. EPUB has many implementations out there, and we want to test as many of them as we can.
It is also helpful for us to have people from the community review the documents themselves. At this point, we will not make any major changes to EPUB (like adding or removing a feature), but as one of our goals was to improve the readability of the specification, feedback on language is very helpful. We want to make sure people can clearly understand how to build EPUB files and reading systems.
If you have any questions about EPUB 3.3, want to help, or are just curious about what is going on in publishing standards, please feel free to reach out to the chairs of the group (myself, Dave Cramer, and Shinya Takami) at group-epub-wg-chairs@w3.org.