W3C Publishing Summit 2017: An Ebook Dev’s View
This is a guest post from Teresa Elsey from Houghton Mifflin Harcourt.
The W3C put on its first Publishing Summit on November 9 and 10 in San Francisco, as part of the W3C Technical Plenary/Advisory Committee week (TPAC), when the people who work on all the various W3C standards get together for in-person meetings.
I came as an ebook developer and interested observer; someone directly involved in W3C work would surely report on the proceedings very differently. But as a bit of an outsider, here are some of the themes I reported back to my team.
The web is everywhere …
The web and web technologies are becoming the platform for everything around us; the open web platform is the “virtual operating system” underlying everything. This was a theme throughout the talks, but it was made obvious in practice merely by all the groups besides publishing gathered there (including, but not limited to: an automotive working group, ones on Internet-connected devices, virtual reality, and web payments – everyone working on how web standards will influence their industries). At the 2012 O’Reilly Tools of Change conference, I laughed when someone said “Your next microwave will speak HTML5.” And now I’m like, of course it will.
In 2012, I laughed when someone said ‘Your next microwave will speak HTML5.’ And now I’m like, of course it will.
… which includes ebooks
As we’ve been talking about for several years, the next step for ebooks is becoming more intrinsically part of the web. This is sometimes described as the moment that “ebooks become first-class citizens of the web.” This phenomenon has had many names over the years (browser friendly format, WEB-PUB), but now we’ve settled on Packaged Web Publications (PWP).
When I say “EPUB4” to my team, I am always joking *cough, cough* I mean, making a valid rhetorical point about how their work needs to be maintainable. The W3C Publishing Working Group is the people who are actually working on that standard. EPUB4 will be a profile of PWP – EPUB4s will be a specific kind of PWP. All EPUB4s will be PWPs, but there could be kinds of PWPs that are not EPUB4s.
This generated a lot of debate and discussion about the difference between ebooks and the web, à la What is the difference between “a book in a browser” and “book content on a web page”? Packaging is a quality of a book – how does the web do packaging? Who owns the UI of the book and the pages? Should the publisher supply the reading system? Do you need a different bookcase for each brand of book that you buy?
A W3C conference would of course not be complete without a plea for the attendees to get involved in the extremely important work of the W3C publishing groups. I will vouch for the many, many extremely smart and friendly people who do this work and want your help. See Publishing @ W3C for more info.
… which includes our workflows
Jeff Jaffe, CEO of the W3C, spoke on “frictionless publishing” – the idea that the web standards (HTML or XML) that we use to build ebooks could be used throughout the whole publishing life cycle, thus avoiding the multiple transformations of most publishing processes (e.g., Word > InDesign > EPUB). This idea was not brand new to me, but he added the idea that since open web standards are also working on next things like accessibility, video, and virtual reality, if we adopt tools/processes that use those web standards we will get all those things along with them.
Frictionless publishing is the idea that web standards such as HTML or XML could be used through the whole publishing life cycle
Nellie McKesson spoke about all of this at a more practical level, describing automated publishing workflows based entirely on XML or HTML. She got into specifics: tools for transforming Word XML to HTML; book-specific HTML/XML standards like HTMLBook, EDUPUB, and DocBook; and tools for turning HTML into EPUBs such as O’Reilly’s HTMLBook tools, and Pandoc.
And throughout the two days, we got a look at tools like Vivliostyle, O’Reilly’s Atlas, and Coko’s journal tools—web-based and web-standards-based tools for creating, editing, and outputting content – both paginated and reflowable.
There’s still the same work to do
Apart from all this future-looking stuff, a number of speakers described problems and conditions that sounded very familiar to those of us in the ebook trenches. Ben Dugas of Kobo gave a fascinating talk about EPUBs from a retailer perspective. A couple Microsoft developers spoke about lessons learned while working on Microsoft’s EPUB reader/store (the new Edge browser has a built-in EPUB reader). There was also a panel on international ebook markets.
My conclusion was that the kind of work my team does every day, such as cleaning up old ebooks, figuring out how to update the backlist, recovering from our own previous bad practices, is common across the industry. Some notes:
- Many of our familiar problems are prevalent, especially in less-mature international markets; for example, low-quality ebooks, many PDF-only, many EPUB2 files, publishers not understanding how to modify InDesign-produced EPUB files.
- The majority of EPUBs submitted by US publishers are still EPUB2. Microsoft said 90% of their US catalog is EPUB2, with 63% of titles created in 2017 still EPUB2. Existing EPUB files still have lots of problems, like faulty or nonexistent TOCs, images used for text content, poor accessibility, and bad/missing metadata.
- The fixed-layout format is still misused/overused from the retailer perspective. And FXL is not bulletproof – it will reflow if, for example, if an ereader can’t access your embedded fonts.
- There are still many places where PDF is a prevalent ebook format. EPUB is not a strong brand for users. For example, many readers may choose PDFs simply because they know and recognize the format. And of course there are many documents besides books – journals, magazines, news, documentation, textbooks – that are still most commonly available only in PDF.
- Liisa McCloy-Kelley (Ebook VP at Penguin Random House) mentioned two future challenges she foresees as print, web, and digital publishing become integrated:
- Fonts (different rights for different environments)
- Image quality/optimization (some digital contexts now require better-than-print quality).
- One interesting tidbit: Japanese ebooks are almost all EPUB3 because EPUB2 never had sufficient language support for Japanese, and 72% are FXL (manga!). So while we know the FXL user experience needs to be better, Japanese ebook producers are hyper aware of this fact and are driving innovation in FXL performance, speed, amount of device storage, etc.
- My favorite fact to take back to my team is that Kobo says they’ve fixed font obfuscation on their readers, so we don’t have to remove font encryption on InDesign-generated books for them anymore (… which we usually learned about when our FXL books reflowed because of missing fonts, to disastrous results; see above).
AI/machine learning/algorithms are already influencing our work
AI/machine learning/algorithms came up frequently, and in examples that made me realize that the age of AI is already here. Tim O’Reilly described these transitions as happening “gradually, and then suddenly.”
- In his keynote, Tim O’Reilly (who has a new book out), talked about computers augmenting human workers – for example, the way that Uber/Lyft enables anyone to be a taxi driver by giving them GPS directions.
- Abhay Parasnis, the CTO of Adobe, mentioned that Adobe is working on using AI to make their tools (Photoshop, InDesign) dramatically less complex for people to learn to use.
- Machine learning can create metadata, too. Google, for example, is working on parsing text, images, and video to know what they are about without reference to formal metadata. Can computers know what a book is about without us providing subject headings, keywords, or descriptions? We’ve already seen this at my workplace, where companies offering to generate keywords for our titles say they use machine learning to generate/refine them.
Bonus fun game: help Google computers learn to recognize drawings.
Accessibility is core to the open web
In order for the web to be open, it has to be accessible to everyone. Both the W3C and the IDPF have always been strongly committed to accessibility, and that work will continue as they develop/maintain EPUB. Some resources:
- Web Accessibility Initiative
- Inclusive Publishing info hub
- DAISY Accessible Publishing Knowledge Base
Romain Deltour demonstrated accessibility tools being developed by DAISY, including Ace, which both performs some automated checks and extracts info for manual checking (read Romain’s introduction to the tool here).
And so much more
At this well-programmed conference, I was exposed to and am still absorbing so much more about libraries, digital preservation, scholarly publishing, international markets, metadata, and the future of the web and the world.
One bonus I can’t resist adding: Jen Simmons’s excellent talk on the graphic design possibilities posed by new CSS properties (CSS Grid!) was well received, even though it’s not immediately applicable to our ebook work. See The Experimental Layout Lab of Jen Simmons for demos.
Teresa Elsey is senior managing editor (digital) in the Trade division at Houghton Mifflin Harcourt. She directs a group that produces and updates more than a thousand ebooks yearly, including adult fiction and nonfiction, culinary and lifestyle, YA titles, picture books, and e-only projects. She began her career in print publishing (though she likes ebooks better!) and has also worked at O’Reilly Media, Let’s Go, and Cengage. She is also on the steering committee of ebookcraft.