PDF manipulation…. oh my god!

I am using LeanPub to create some manuals and it's a very nice service.  You just have to use Markdown to write your document and it generates a good looking pdf document from it. Well, as far as I've read today it's actually kramdown a superset written in Ruby. Everything was fine but that my partners wanted to add a custom header image in the manual. This is not supported by Leanpub, in fact it's not supported by Markdown as far as I know. So I decided to manipulate the pdf using my preferred OS (linux). It's not easy!

Pdftk (the pdf toolkit) is great!

pdftk  it's a powerful command line tool to manipulate pdfs. The tool that has saved me eventually. Thanks Sid Steward! What I've done is:

  1. Use LibreOffice to insert the header image into an empty document and then save it as pdf (header.pdf)
  2. Use LibreOffice to create a cover for the book (cover.pdf)
  3. Remove the first two pages from my_book.pdf,  generated by Leanpub:
    pdftk my_book.pdf cat 3-end output tmp.pdf
  4. Add the cover:
    pdftk cover.pdf tmp.pdf cat output tmp2.pdf
  5. Add the background:
    pdftk tmp2.pdf background header.pdf output my_final_book.pdf

See more powerful examples of pdftk.

Things that didn't work so well:

  • Xournal: this tool is great when it comes to adding some text to a pdf document among other things. Probably the best pdf tool with GUI I've tried. However, it doesn't support adding images or headers. There is a patch to insert images but I didn't spend the time trying to compile it.
  • PDFedit: looks powerful but I didn't know how to use it. I could remove text from the document easily but nothing more.
  • uPDF: looks interesting but it's buggy,  like experimental. It didn't work for me, freezes when saving the document and the GUI is quite hard.
  • PDF Mod: this one is looking very good! but I knew about it when I already solved the problem and didn't try it out.  The doc says it modifies pdf but I don't know whether it supports headers/backgrounds and things like that.
  • LibreOffice-pdfimport: Right, it opens up the pdf document but it looses its format and images at least for my pdf book.
  • Pandoc: In desperation I tried to generate the pdf myself skipping Leanpub, from the markdown text. Pandoc is brilliant and very powerful converter. Together with latex-beamer it has generated a pdf for me:
    pandoc -t beamer -o my_book.pdf -i my_book.txt
    The problem was that I don't have a nice latex template to use so I just loose all the nice formatting provided by Leanpub.

I also tried two commercial tools for Windows but none of them were very good either.  The prices were reasonable so I though I would just buy them but the trial version was good enough to realize the software wasn't good.

This story has taken me way much time I thought,  I hope you save some time reading this if you face the same problem 🙂


Enjoyed reading this post?
Subscribe to the RSS feed and have all new posts delivered straight to you.
  • Iván Stepaniuk

    Bummer! The thing is PDFs are not designed to be edited or reformatted (that’s why it’s so difficult to transform them for ebook reading). The semantics of the documents such as paragraphs, line breaks, or lists are lost in favor of fixed text cells (and that’s why it’s very good for printing). Good to know pdftk does the job!

  • http://carlosble.com Carlos Ble

    Thanks Ivan! Good to know, I guessed that but never really knew much about pdf internals.

  • Trivés Errante

    pdfsam (pdf split and merge) is another good option, a java based and open source app.

  • http://www.carlosble.com/ Carlos Ble

    Thanks for the tip, I didn’t know it 🙂