Last modified: 2013-11-15 07:20:41 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T38597, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 36597 - Switch from jpeg to png for thumbnailing pdfs
Switch from jpeg to png for thumbnailing pdfs
Status: NEW
Product: MediaWiki extensions
Classification: Unclassified
PdfHandler (Other open bugs)
unspecified
All All
: Normal normal (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-05-07 15:34 UTC by Mark A. Hershberger
Modified: 2013-11-15 07:20 UTC (History)
6 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
using png (58.00 KB, image/png)
2012-05-07 16:06 UTC, Mark A. Hershberger
Details
downscaled png (54.69 KB, image/png)
2012-05-07 16:08 UTC, Mark A. Hershberger
Details

Description Mark A. Hershberger 2012-05-07 15:34:59 UTC
From bug 36580:

<Robin_Watts> That looks a lot like you're rendering to JPEG - the ringing
              artifacts etc.
<chrisl> hexmode: the "heavily compressed" effect is, as Robin_Watts
         mentioned, because it's jpeg compressed - the solution is: don't use
         jpeg.....
Comment 1 Mark A. Hershberger 2012-05-07 16:06:13 UTC
Created attachment 10534 [details]
using png

Switching to png output instead of jpeg using the command in bug 36580 comment 4 results in smaller file size, as well:

$ gs -sDEVICE=png16m -sOutputFile=after_gs.png -dFirstPage=1 -dLastPage=1 -r150 -dBATCH -dNOPAUSE -q Welcome2WP_English_082310.pdf

gives me a file size of 59394 instead of 143024.
Comment 2 Mark A. Hershberger 2012-05-07 16:08:32 UTC
Created attachment 10535 [details]
downscaled png

after convert, compare to attachment #10524 [details] (https://bugzilla.wikimedia.org/attachment.cgi?id=10524)
Comment 3 Antoine "hashar" Musso (WMF) 2012-06-21 15:51:06 UTC
Gerrit change:

https://gerrit.wikimedia.org/r/6802
Comment 4 Philippe Elie 2012-09-05 15:33:06 UTC
GS default compression for jpeg device is 0.75, it'll better to try first a saner value like -dJPEGQ=95 and compare the output size and quality between png and jpg before switching to png. Switching to png can have a huge impact on wikisource using pdf file.
Comment 5 Dmitriy Sintsov 2012-09-06 04:57:52 UTC
png is better for low-color pages, jpg is better for wide-color pages.
Comment 6 Antoine "hashar" Musso (WMF) 2012-10-02 08:22:48 UTC
Copy paste from a comment I made on Gerrit change #6802 :

----------------------------------------------------------
We could add a parameter to the thumb syntax to let the user choose the rendered. Something like:

 [[File:foo.pdf|thumb|png]]
 [[File:foo.pdf|thumb|jpg]]

And have the default set by a global configuration variable such as $wgPdfThumbOutputFormat or something.  Would get us the best of both worlds :-]
----------------------------------------------------------

That is definitely an easy change to the current patchset I will be more than happy to review it :)
Comment 7 Dmitriy Sintsov 2012-10-02 10:41:20 UTC
It is better to calculate the number of color in PDF page, because one PDF file may combine low-color text pages and colorful illustrations. Or, if the color range calculation is too expensive, one may compress to lossless png and to 95% jpeg the same page and choose which image is smaller. For a wide-color images, lossless png will be MUCH larger than high-quality 95% jpeg.
Comment 8 Antoine "hashar" Musso (WMF) 2013-01-09 14:03:52 UTC
I have abandoned Gerrit change #6802 pending a proper design choice which should be happening in this bug report.
Comment 9 Andre Klapper 2013-01-23 13:17:15 UTC
[Patch in Gerrit got reviewed (and abandoned), hence resetting keyword]
Comment 10 Brion Vibber 2013-09-25 20:17:56 UTC
Chaning state back to 'new' since the previous patch was abandoned some time ago.
Comment 11 Dario Taraborelli 2013-11-15 07:20:41 UTC
I'm interested in following up on this bug, particularly for PDFs of vectorial graphs generated from data analysis software (like R or Mathematica), which researchers (including myself) routinely upload to Commons.

The quality of JPEG thumbnails for these PDF graphs is abysmal when compared to a thumbnails for a native PNG format. 

Original files:
https://commons.wikimedia.org/wiki/File:Active_Editors_arwiki.pdf
https://commons.wikimedia.org/wiki/File:Active_Editors_arwiki_2.png

Thumbnails:
https://upload.wikimedia.org/wikipedia/commons/thumb/3/39/Active_Editors_arwiki.pdf/page1-1004px-Active_Editors_arwiki.pdf.jpg
https://upload.wikimedia.org/wikipedia/commons/0/06/Active_Editors_arwiki_2.png

The only other option for vectorial plots to avoid these compression artifacts is to upload them as SVG (which renders as PNG). However in many cases PDF is the default export option and the most common format for scientific media people will consider donating to Commons.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links