DjVu: Almost Awesome

Earlier today, I started reading about the DjVu family of document formats. It really sounds slick: file sizes much smaller than PNG (and incredibly smaller than TIFF or PDF) for lossless data with the DjVuText format, file sizes much smaller than JPEG with equivalent quality for the DjVuPhoto format, and an advanced DjVuDocument format that separates the background photo from the foreground text and produces a quite nice output. There are wonderful plugins for browsers on all platforms, and server-side support already in Debian for sending pages incrementally as needed by clients.

I tried this out a bit and indeed it looks great on monochrome scans, and I made a quick try of DjVuPhoto as well. That part looks great.

So here’s the bad news.

Debian has no nice way to generate DjVuDocument files. There is a PS/PDF-to-DjVu converter that uses a djvu driver for Ghostscript. But Debian does not include that driver. Though, strangely, the program that depends on this driver is actually in Debian main. (Bug filed.) That program actually will make background-separated images, but only if they are separate objects in the input.

All Debian has is a program csepdjvu, which requires you to somehow manually separate the foreground and background images. Ugh.

So there is no way using software in Debian to produce DjVuDocument files with automatic separation, either from scans or from a digital source. It appears that there may not be Free Software to do this from scans either. This fact is not made clear at all in the DjVu documentation that is around.

6 thoughts on “DjVu: Almost Awesome

  1. PDF is just a container format, but you are correct that TIFF in PDF isn’t that wonderful in terms of compression sizes. However, PDF also supports JBIG2, which is very much better.

    There’s a skeleton set of free software for producing PDFs with JBIG2 images on my web page: http://www.imperialviolet.org/jbig2.html

    It’s actually the core of the code which compresses the largest set of JBIG2 PDFs in the world[1], but some of that isn’t open source I’m afraid.

    [1] http://books.google.com

    Reply

  2. Last time I looked at djuv, it seemed to be patent-protected although decoder (but not encoder) software gets a free irrevocable license. But I’m not sure, in fact I’d love to be wrong.

    Reply

  3. I just went thru the Goerzen family Christms pics. Looks like a good time was had by all. Do you still have a snow cover. We’ve still got ours from two weeks before Christmas.

    Reply

    John Goerzen Reply:

    Oddly enough, we do still have snow on the ground. We have had only a little bit of time since early December when we didn’t. It’s been nice — and usual.

    Reply

  4. I’m the Debian maintainer for djvu, and in contact with the upstream developers. A few points. (a) There is no patent issue: there used to be, but it was solved years ago. (b) There is *one* stupid gs driver that is under a GPL-incompatible license, thus preventing them from being distributed together. However if someone was to write their own gs driver to do the same job (it is quite straightforward) and release that under a reasonable license, the problem would be solved. HINT HINT.

    Reply

  5. As your have been told, there is a licensing problem with that Ghostscript driver which makes it incompatible with Ghostscript. However, that is just a minor piece which will analyse PS/PDF files and convert to DjVu.

    There is no free alternative to separate background from foreground (apart from some dirty scripts to process a few types of images). This is the main problem that stops the spread of DjVu to the masses. However, still you can make your own separation using image editing software if you like, but this is quite a tiring process. The monochrome (character) encoding, named JB2, works pretty fine and outperforms that JBIG2 used in some PDFs, getting close to the compression ratios of formatted text!

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *