GREP Cleanup of InDesign to ePUB Files from Ron Bilodeau

  • Sumo

At the 2011 Print & ePublishing Conference (PePcon), Ron Bilodeau shared some great GREP for cleaning up ePUBs that were created using InDesign. You can download a text file of this GREP, a PDF of Ron’s presentation, and an updated PDF for InDesign CS5.5 here with additional tips. Ron posts great information on ePUBs on his Silvadeau Consulting blog.

GREP cleanup for final ePUB file:

Remove intrusive “font-size” element from html:
Find: font-size:.*?;
Replace:

Remove unnecessary “style” element from html:
Find: style=[\”][\”]
Replace:

Hide Bracket to the right side of margin initials:
Find: (?<=.)\]</span>
Replace: </span><span class=”hidden”>\]</span>

Hide Bracket to the left side of margin initials:
Find: <span class=”bracket”> \[(?=.*)
Replace: <span class=”hidden”>\[</span><span class=”bracket”>

Remove intrusive “width” element from html:
Find: width=[\”].*?[\”]
Replace:

Remove intrusive “height” element from html:
Find: height=[\”].*?[\”]
Replace:

Update encryption.xml file so that ePub will validate:
Find: Algorithm=”http://ns.adobe.com/pdf/enc#RC”/
Replace: Algorithm=”http://www.idpf.org/2008/embedding”/
——————————————————
GREP cleanup for ePub file being converted to Mobi:

Replace inline number images with (#):
Find: <img class=[\”].*?[\”] src=[\”].*?[\”] alt=”1″/>
Replace: (1)

Find: <img class=[\”].*?[\”] src=[\”].*?[\”] alt=”2″/>
Replace: (2)

Find: <img class=[\”].*?[\”] src=[\”].*?[\”] alt=”3″/>
Replace: (3)

Find: <img class=[\”].*?[\”] src=[\”].*?[\”] alt=”4″/>
Replace: (4)

Find: <img class=[\”].*?[\”] src=[\”].*?[\”] alt=”5″/>
Replace: (5)

Find: <img class=[\”].*?[\”] src=[\”].*?[\”] alt=”6″/>
Replace: (6)

Find: <img class=[\”].*?[\”] src=[\”].*?[\”] alt=”7″/>
Replace: (7)

Add horizontal rule <hr/> above footnotes:
Find: <div class=”footnotes”>
Replace: <div class=”footnotes”>
<hr/>

 Ron Bilodeau is a graphic designer specializing in book template design and digital media. He has designed and refined interior templates for Cooks Illustrated and O’Reilly Media, including O’Reilly’s four-color Learning series, the most recent example of which is Learning Flash CS4 Professional.

In addition to his design work, Ron is using InDesign to develop a seamless workflow for easily exporting print-ready PDFs and web-ready digital formats from the same source files. He is currently the Production and Design Specialist at O’Reilly, where he is focusing on producing valid ePub documents from InDesign, the successful results of which are being offered in some of O’Reilly’s eBook bundles. Follow Ron on Twitter at @biladew.

6 Responses to “GREP Cleanup of InDesign to ePUB Files from Ron Bilodeau”

  1. […] the original post here: ePUBSecrets » Blog Archive » GREP Cleanup of InDesign to ePUB … grep, indesign, nformation-on-epubs, pdf, posts-great, silvadeau, silvadeau-consulting, […]

  2. scott c says:

    Great info – a couple of those are very helpful.

    However…

    I have about 100 folders with epub “files” in them, which, of course, are actually folders with an .epub extension and the html files embedded in there. Is there any way to search within these epub “files” as a batch search and replace so I can do them all at once instead of going into each folder?

  3. Matthew says:

    Scott, whicht text/xml editor are you using to run the script to run GREP/RegEx? I use oXygen XML Editor. It will certainly search all the files in a single ePUB, but I haven’t tried it across multiple ePUBs in a folder. I’ll take a look and see if it is possible.

  4. scott c says:

    Matthew, thanks for the reply. I use Notepad++ and Sigil for most of my xml/regex use and textcrawler for batch find and replace oprations on files/folders and they all support multiple files/folders. The problem I’ve run into is that if you have a main folder with the epubs in that folder, programs don’t recognize that the epubs are really folders so the programs don’t go inside them to search for additional files. I’d be happy to pony up the big $ for oxygen, or whatever software, because it would be a big time saver for me.

  5. scott c says:

    Matthew, I think I found what I’m looking for – PowerGREP. Works great searching through as many folders, subfolders, zip, epub, etc. to get to wherever the xml/html files are. Then has many search and replace options, including regular expressions.

  6. When you have a lot of desires you must involve a lot of action