Home

ePUB: The Language of eBooks —A Primer

Posted by Matthew | On: Dec 15 2011
  • Sharebar

This is a guest post from Iris Febres, a graduate student in electronic publishing at Emerson College. It is based on a PowerPoint presentation she did in her class that was also a Tweetsentation during #ePrdctn Hour.

So . . . what is EPUB?

EPUB (ePub, ePUB) is short for “electronic publication” and is a standard file format primarily used in the production of eBooks. Unlike a fixed digital format like a PDF, an ePUB allows content reflow based on screen size or font size. An ePUB is actually a ZIP file that contains and packages content; instead of the .zip extension it uses the file extension .epub. It utilizes markup in HTML, XML, and CSS (though the latter is optional), and may contain images, audio, and video files.

EPUB as a Standard

The ePUB specification is maintained by the International Digital Publishing Forum (IDPF), which is, “The global trade and standards organization dedicated to the development and promotion of electronic publishing and content consumption.” ePUB became an official standard in 2007 and is the successor to the Open eBook format. It is formally known as the Open eBook Publication Structure (OEBPS), which is why the folder containing all the content in an ePUB is the OPBPS folder. Most ePUBs currently available are ePUB 2.0.1 files. The next version up: EPUB 3.0. The IDPF approved EPUB 3.0 as a final Recommended Specification on October 10, 2011. While the new specification is not supported by most readers at this time, you can expect to see the adoption of the new specification in early 2012.

An EPUB file is defined by three specifications:

  1. Open Container Format (OCF): The OCF specifies the order of the files in an EPUB (the structure of the file)
  2. Open Packaging Format (OPF): The OPF defines the contents of the file as well as its appropriate metadata (the information about the file and what’s in it)
  3. Open Publication Structure (OPS): The OPS specifies the physical contents of the eBook (the actual content—what you view)

EPUB in a Nutshell

The .epub file is, in essence, a web site. It is a collection of files, including HTML/XHTML, CSS, and XML. As eBook creators, we have flexibility in design thanks in part to these different markup languages at our disposal. Do you want a specific font in your ePUB? You can do that with font embedding, too.

If the .epub file is a Web site, the the ePUB Reader or eReader, the device or software app is, in essence, a browser. And like our favorite browsers (i.e. FireFox, Safari, Opera), different readers will interpret a given EPUB file in different ways . . .  much to our chagrin.

Here is an example of how three different eReaders—Adobe Digital Editions (ADE), NOOK Study software, and iBooks on an iPhone—display the the table of contents of Reflections: Writing, Service-Learning, and Community Literacy:

TOC rendered on Adobe Digital Editions (ADE)

TOC rendered on Nook Study software

TOC rendered on iPhone

Components of an .EPUB File

An .epub file contains three primary files and folders  zipped together:

mimetype (file)

The mimetype is a simple ASCII text file with a single line of text. It should not be in a folder. The mimetype file tells the operating system of the eReader how the eBook is formatted—the MIME type. All mimetype files in ePUBs should be the same: application/epub+zip.

mimetype file

META-INF (folder)

The META-INF folder contains one XML file and one file only: container.xml. The container.xml file points to the contents of the eBook. It contains encryptions for DRM and embedded fonts. It’s general structure is
<container>
     <rootfiles>
          <rootfile>
META-INF folder

container.xml file

OEBPS (folder)

The OEBPS folder contains the blood and guts of your .epub file. It is where your ePUB content lives. Images, text, fonts, stylesheets—all that is stored here. It is made up of three special files—.opf, .ncx, and .css—and all the files that make up the content of the ePUB.

OEBPS folder

Image files in OEBPS folder

An OEBPS folder expanded to show all the file details

.opf (file)

The package.opf or content.opf file is an XML file that identifies all the contents of your eBook—files, images, etc.—and establishes the structure of the eBook itself. The file contains four different sections
  • <metadata> This sections contains all the metadata about the ePUB—title, subtitle, language, ISBN, author, description, subjects, publisher, publication date, copyright, price, and cover—markup up in DublinCore (dc:).
  • <manifest> The manifest is a list of all the files included in the OEBPS folder except the .opf file. This includes the .ncx, the .css, .html/xhtml files, image files, font files, and anything else include in the ePUB.
  • <spine> The spine is a list of all the “chapters” or .html/.xhtml files in the OEBPS in the order in which they should open as a reader goes through the ePUB.
  • <guide> The guide section identifies some specific files that are used by eReaders, namely the cover and table of contents.

package.opf or content.opf file

.ncx (file)

The toc.ncx file is the Navigation Control file for XML (thus the .ncx extension). It is an XML file that handles your navigational table of contents in the eBook. This is the menu TOC that appears on the left side in ADE or through the Table of Contents menu in an eReader device. It establishes the navigation between elements of your eBook. Each item in the .ncx file has a playOrder which determines where it appears in the navigational TOC.

toc.ncx file

The toc.ncx file and toc.html file as they display in ADE

The toc.ncx file as it displays in the navigational column

 .css

The Cascading Style Sheet or .css file is the file that tells the eReader how to display different elements in an ePUB. If functions just like cascading style sheets on Web pages.

.css file

EPUB Problems/Controversies

While ePUB is intended to be a universal format for eBooks it does have its shortcomings. It is not truly a “universal” filetype; at the moment Amazon sells more Kindle editions in the United States that all retailers selling .epub files. Publishers and eBook developers often have to tweak ePUB files so that they display correctly on each device (one EPUB for Nook, another for iPad . . .), which, you know, defeats the purpose of the filetype.
ePUB as a format is not easy to learn. It requires a decent proficiency in HTML, XML, and CSS—if you are not familiar with these, it will be tough to grasp.
ePUB sucks for displaying poetry and graphical works like comics. As a format, it has a bias toward prose works.

The Future of EPUB

ePUB as a standard is here to stay, even with Amazon’s current dominance in the eBook market. Amazon has greater competition now than a few years ago. Most importantly, readers want accessibility: one book, multiple devices. ePUB will redefine workflows publishing workflows because most publishers use their ePUBs as the base files for of Kindle conversions.
ePUB is evolving and will continue to evolve and allow greater accessibility to content. Since EPUB 3 = HTML5 + CSS3 + JavaScript, the interactivity, animation, and multimedia capabilities of ePUB will continue to expand along with publishers’ control of how their content displays on all devices. This will make ePUB the preferred format for publishers creating eBooks.

References

Tallent, Joshua. “eBook Architects Workshop: EPUB 3.” eBook Architects, Austin, TX. 20 September 2011. Presentation.

“EPUB | International Digital Publishing Forum.” International Digital Publishing Forum | Trade and Standards Organization for the Digital Publishing Industry. N.p., n.d. Web. 15 Nov. 2011. <http://idpf.org/epub>.

MobileRead Wiki – ePub.” MobileRead. N.p., n.d. Web. 15 Nov. 2011. <http://wiki.mobileread.com/wiki/EPub>.

MobileRead Wiki – ePub 3.” MobileRead. N.p., n.d. Web. 15 Nov. 2011. <http://wiki.mobileread.com/wiki/EPub_3>.

 

Iris Febres is a second-year graduate student at Emerson College, studying electronic publishing. She recently completed her master’s project: a digital, graphic novella optimized for electronic readers. She writes a blog on news in digital publishing, and while she’s not in class or doodling on napkins, she’s probably on Twitter where her handle is @ePubPupil. You can see images from her master’s project at here.

4 Comments

  1. [...] > Matthew lives in Evanston (NU Wildcats) but follows Michigan State football > An EPUB Primer written by @epubpupil in EPUBSecrets.com > David’s article about PrintUI.com will be in [...]

  2. [...] Want to learn more about EPUB, and especially what’s inside that little epub file? Check out this great primer at epubsecrets.com! [...]

  3. Merle Hall says:

    Thanks, this was very helpful. Clear and easy to understand. I am looking forward to having my students write epubs. If you don’t mind I would like to refer to this cite it as a reference.

  4. [...] <li><a href=”http://epubsecrets.com/epub-the-language-of-ebooks-a-primer.php&#8221; title=”ePub Primer” target=”_blank”>Primer on ePub [...]

Leave a comment