Last modified: 2014-10-13 14:03:39 UTC
Would someone be so kind to install pdf2djvu on labs. Thanks.
Providing usecases is welcome. :)
I have occasional use to convert public domain files found in pdf format. I need to trim components, and we have PDFtk on labs that suits that purpose. While they can be loaded to Commons for Wikisource as PDF files, they are generally inferior in retaining line by line text so not as useful for Wikisources. This will enable me to grab, trim, and convert files from labs, then push to Commons. An example is https://commons.wikimedia.org/wiki/File:Electoral_Disabilities_of_Women.pdf which I have uploaded, though due to the poor pdf rendering, I am needing to separately OCR (PITA). At the moment, I am pulling in one or two files a week.
and https://en.wikisource.org/wiki/Help:DjVu_files#Method_3_-_pdf2djvu
billinghurst, I installed djvudigital in /data/project/phetools to convert pdf to djvu, conversion fail in some rare case of half broken pdf but it's enough stable to use it. The script to use it is https://github.com/phil-el/phetools/blob/master/ocr/pdf_to_djvu.py I can help you on IRC to setup it.