Today I was wondering about converting a pdf made from scan of a book
into djvu, hopefully to reduce the size, without too much loss of
quality. My initial experiments with
pdf2djvu were a bit
discouraging, so I invested some time building
gsdjvu in order to be able
to run djvudigital.
Watching the messages from djvudigital I realized that the reason it
was achieving so much better compression was that it was using black
and white for the foreground layer by default. I also figured out that
the default 300dpi looks crappy since my source document is apparently
600dpi.
I then went back an compared djvudigital to pdf2djvu a bit more
carefully. My not-very-scientific conclusions:
- monochrome at higher resolution is better than coloured foreground
- higher resolution and (a little) lossy beats lower resolution
- at the same resolution,
djvudigitalgives nicer output, but at the same bit rate, comparable results are achievable withpdf2djvu.
Perhaps most compellingly, the output from pdf2djvu has sensible
metadata and is searchable in evince. Even with the --words option,
the output from djvudigital is not. This is possibly related to the
error messages like
Can't build /Identity.Unicode /CIDDecoding resource. See gs_ciddc.ps .
It could well be my fault, because building gsdjvu involved guessing at corrections for several errors.
comparing
GS_VERSIONto 900 doesn't work well, whenGS_VERSIONis a 5 digit number.GS_REVISIONseems to be what's wanted there.extra declaration of struct timeval deleted
-lz added to command to build mkromfs
Some of these issues have to do with building software from 2009 (the
instructions suggestion building with ghostscript 8.64) in a modern
toolchain; others I'm not sure. There was an upload of gsdjvu in
February of 2015, somewhat to my surprise. AT&T has more or less
crippled the project by licensing it under the CPL, which means
binaries are not distributable, hence motivation to fix all the rough
edges is minimal.
| Version | kilobytes per page | position in figure |
|---|---|---|
| Original PDF | 80.9 | top |
| pdf2djvu --dpi=450 | 92.0 | not shown |
| pdf2djvu --monochrome --dpi=450 | 27.5 | second from top |
| pdf2djvu --monochrome --dpi=600 --loss-level=50 | 21.3 | second from bottom |
| djvudigital --dpi=450 | 29.4 | bottom |
