Last modified: 2013-11-23 00:39:38 UTC
For example with this category: https://commons.wikimedia.org/wiki/Category:Media_contributed_by_Zentralbibliothek_Z%C3%BCrich_%28original_picture%29
This was not the case a few weeks ago, so something has changed (in the wrong way) in the web server/proxy configuration or in MW code base.
Thanks for taking the time to report this! Confirming.
Issue appears to be with https://commons.wikimedia.org/wiki/File:Zentralbibliothek_Zürich_-_Heinrich_Bullingers_Westerhemd_-_000012135.tif
Not even {{filepath}} works with that file, what's the URL to the original?
It has an img_metadata field of: a:1:{s:6:"errors";a:1:{i:0;s:85:"tiffinfo command failed: '/usr/bin/tiffinfo' '/tmp/localcopy_2bbfcd346e5d-1.tif' 2>&1";}} This would fail the isMetadataValid test for: if ( !isset( $metadata['TIFF_METADATA_VERSION'] ) ) { return false; } So presumably, PagedTiffHandler tries to re-extract the metadata on every request, which is probably hanging.
(In reply to comment #4) > Not even {{filepath}} works with that file, what's the URL to the original? bawolff@Bawolff-L:/var/www/w/extensions/PagedTiffHandler$ echo -n Zentralbibliothek_Zürich_-_Heinrich_Bullingers_Westerhemd_-_000012135.tif | md5sum fa1ecb93ed05e8902d3b69a97d726207 So that would make: https://upload.wikimedia.org/wikipedia/commons/f/fa/Zentralbibliothek_Zürich_-_Heinrich_Bullingers_Westerhemd_-_000012135.tif
I was looking at this lately (due to temp files filling up /tmp). That file hangs on my own computer too trying to import it into mediawiki: [01:22:17] <AaronSchulz> read(8, "identify: Memory allocation fail"..., 8192) = 100 [01:22:19] <AaronSchulz> wait4(30506, 0x7fff5e7000d4, WNOHANG|WSTOPPED, NULL) = 0 [01:22:20] <AaronSchulz> select(12, [8 11], [], [], NULL [01:22:30] <AaronSchulz> ...and stuck
Change 96897 had a related patch set uploaded by Brian Wolff: Do not repetitively extract metadata of broken tiff files. https://gerrit.wikimedia.org/r/96897
FYI, https://gerrit.wikimedia.org/r/#/c/29913/ : but it would be too simple if it was just that. :) If I issue tiffinfo on the file, I get 700 MB worth of an endless repetition of 0x81,0xff,0x81,0xff,0x81,0xff etc. I hope this is not legit even for such a crazy format as TIFF? $ /usr/bin/time -v tiffinfo Zentralbibliothek_Zürich_-_Heinrich_Bullingers_Westerhemd_-_000012135.tif | wc TIFFReadDirectory: Warning, Zentralbibliothek_Zürich_-_Heinrich_Bullingers_Westerhemd_-_000012135.tif: wrong data type 7 for "RichTIFFIPTC"; tag ignored. TIFFReadDirectory: Warning, Zentralbibliothek_Zürich_-_Heinrich_Bullingers_Westerhemd_-_000012135.tif: unknown field with tag 37724 (0x935c) encountered. Command being timed: "tiffinfo Zentralbibliothek_Zürich_-_Heinrich_Bullingers_Westerhemd_-_000012135.tif" User time (seconds): 16.02 System time (seconds): 1.13 Percent of CPU this job got: 52% Elapsed (wall clock) time (h:mm:ss or m:ss): 0:32.72 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 1646320 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 1 Minor (reclaiming a frame) page faults: 103068 Voluntary context switches: 811 Involuntary context switches: 172460 Swaps: 0 File system inputs: 274552 File system outputs: 8 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0 250 264 699801833
(In reply to comment #8) > Change 96897 had a related patch set uploaded by Brian Wolff: > Do not repetitively extract metadata of broken tiff files. > > https://gerrit.wikimedia.org/r/96897 This is just a patch to make it so stuff doesn't get bogged down extracting data every request, that will ultimately fail (Which we do for most other formats). We should still figure out what's going on here, separate from this patch.
Note, time is not particularly reliable for memory, but top reports 500+ MB VIRT and about 300 RES. So I guess it's killed for that reason?
Change 96897 merged by jenkins-bot: Do not repetitively extract metadata of broken tiff files. https://gerrit.wikimedia.org/r/96897
Page https://commons.wikimedia.org/wiki/Category:Media_contributed_by_Zentralbibliothek_Z%C3%BCrich_%28original_picture%29 is again available. Therefore, the most critical aspect of this bug is IMO fixed. Thank you.
With the above patch, new uploads may still hit a delay or timeouts due to a libtiff bug for some files, though views of them will be fast. Categories and pages using them will also be fast now. ?action=purge will still be slow for effected file description pages.