pdf2htmlEX

I had multiple PDFs that I was trying to convert to ePub files (Apple Books handles it better on Mac). One of them was just made up of images (scanned book), so I was able to convert it to a series of images and that to an ePub. However, the others had complex formatting and text, so I had to find a good tool for the job.

I found pdf2htmlEX, which lets you convert a PDF to an HTML file while keeping all formatting and layout. Someone created another tool based on it, pdf2epubEX, that converts it to an ePub.

I had some trouble getting it to work with Docker, but once it did work, it worked great. However, pdf2epubEX did weird things if the input file had a space in its name, and that took a bit for me to figure out. I also couldn’t figure out how to use a custom table of contents (pdf2epubEX just inserted a boilerplate that I couldn’t find a way to change without too much work).

Leave a Reply Cancel reply