Tag Archives: djvu pdf

DjVu: Almost Awesome

Earlier today, I started reading about the DjVu family of document formats. It really sounds slick: file sizes much smaller than PNG (and incredibly smaller than TIFF or PDF) for lossless data with the DjVuText format, file sizes much smaller than JPEG with equivalent quality for the DjVuPhoto format, and an advanced DjVuDocument format that separates the background photo from the foreground text and produces a quite nice output. There are wonderful plugins for browsers on all platforms, and server-side support already in Debian for sending pages incrementally as needed by clients.

I tried this out a bit and indeed it looks great on monochrome scans, and I made a quick try of DjVuPhoto as well. That part looks great.

So here’s the bad news.

Debian has no nice way to generate DjVuDocument files. There is a PS/PDF-to-DjVu converter that uses a djvu driver for Ghostscript. But Debian does not include that driver. Though, strangely, the program that depends on this driver is actually in Debian main. (Bug filed.) That program actually will make background-separated images, but only if they are separate objects in the input.

All Debian has is a program csepdjvu, which requires you to somehow manually separate the foreground and background images. Ugh.

So there is no way using software in Debian to produce DjVuDocument files with automatic separation, either from scans or from a digital source. It appears that there may not be Free Software to do this from scans either. This fact is not made clear at all in the DjVu documentation that is around.