Last modified: 2012-01-04 14:09:17 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T34432, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 32432 - 10s miss service time for thumbnails requested via ms6.esams
10s miss service time for thumbnails requested via ms6.esams
Status: RESOLVED FIXED
Product: Wikimedia
Classification: Unclassified
General/Unknown (Other open bugs)
unspecified
All All
: High minor with 2 votes (vote)
: ---
Assigned To: Nobody - You can work on this!
http://upload.wikimedia.org/wikipedia...
: ops
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-11-15 15:04 UTC by Saibo
Modified: 2012-01-04 14:09 UTC (History)
7 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Saibo 2011-11-15 15:04:48 UTC
Thumbnails take ages (10 seconds) to be rendered after a purge.

Testcase 1:
http://commons.wikimedia.org/w/index.php?title=File:AaatestSonnepalmenstrand90.jpg&action=purge

wget http://upload.wikimedia.org/wikipedia/commons/thumb/0/01/AaatestSonnepalmenstrand90.jpg/303px-AaatestSonnepalmenstrand90.jpg
--2011-11-15 15:58:20--  http://upload.wikimedia.org/wikipedia/commons/thumb/0/01/AaatestSonnepalmenstrand90.jpg/303px-AaatestSonnepalmenstrand90.jpg
Auflösen des Hostnamen »upload.wikimedia.org«.... 91.198.174.234
Verbindungsaufbau zu upload.wikimedia.org|91.198.174.234|:80... verbunden.
HTTP Anforderung gesendet, warte auf Antwort... 200 OK
Länge: nicht spezifiziert [image/jpeg]
In »303px-AaatestSonnepalmenstrand90.jpg« speichern.

    [  <=>                                  ] 16.695      68,7K/s   in 0,2s

2011-11-15 15:58:31 (68,7 KB/s) - »303px-AaatestSonnepalmenstrand90.jpg« gespeichert [16695]

Download took 0.2 seconds but wget was waiting for the server to answer for 10 seconds.

Testcase 2:
Call http://commons.wikimedia.org/w/index.php?title=File:Turm_am_Thurn-und-Taxis-Platz_2010-02-08.jpg&action=purge

Then:
wget http://upload.wikimedia.org/wikipedia/commons/thumb/7/74/Turm_am_Thurn-und-Taxis-Platz_2010-02-08.jpg/308px-Turm_am_Thurn-und-Taxis-Platz_2010-02-08.jpg
--2011-11-15 16:02:04--  http://upload.wikimedia.org/wikipedia/commons/thumb/7/74/Turm_am_Thurn-und-Taxis-Platz_2010-02-08.jpg/308px-Turm_am_Thurn-und-Taxis-Platz_2010-02-08.jpg
Auflösen des Hostnamen »upload.wikimedia.org«.... 91.198.174.234
Verbindungsaufbau zu upload.wikimedia.org|91.198.174.234|:80... verbunden.
HTTP Anforderung gesendet, warte auf Antwort... 200 OK
Länge: nicht spezifiziert [image/jpeg]
In »308px-Turm_am_Thurn-und-Taxis-Platz_2010-02-08.jpg« speichern.

    [  <=>                                  ] 35.224      98,1K/s   in 0,4s

2011-11-15 16:02:15 (98,1 KB/s) - »308px-Turm_am_Thurn-und-Taxis-Platz_2010-02-08.jpg« gespeichert [35224]

→ 9 seconds of waiting at step "HTTP Anforderung gesendet, warte auf Antwort..." (that is after(!) DNS lookup)

A subsequent query of the same image is fast:
wget http://upload.wikimedia.org/wikipedia/commons/thumb/7/74/Turm_am_Thurn-und-Taxis-Platz_2010-02-08.jpg/308px-Turm_am_Thurn-und-Taxis-Platz_2010-02-08.jpg
--2011-11-15 16:03:13--  http://upload.wikimedia.org/wikipedia/commons/thumb/7/74/Turm_am_Thurn-und-Taxis-Platz_2010-02-08.jpg/308px-Turm_am_Thurn-und-Taxis-Platz_2010-02-08.jpg
Auflösen des Hostnamen »upload.wikimedia.org«.... 91.198.174.234
Verbindungsaufbau zu upload.wikimedia.org|91.198.174.234|:80... verbunden.
HTTP Anforderung gesendet, warte auf Antwort... 200 OK
Länge: 35224 (34K) [image/jpeg]
In »308px-Turm_am_Thurn-und-Taxis-Platz_2010-02-08.jpg.1« speichern.

100%[======================================>] 35.224      --.-K/s   in 0,1s

2011-11-15 16:03:13 (236 KB/s) - »308px-Turm_am_Thurn-und-Taxis-Platz_2010-02-08.jpg.1« gespeichert [35224/35224]

Below a second.



I get similar timing when watching the net traffic after purge with Firebug in Firefox.
Comment 1 Antoine "hashar" Musso (WMF) 2011-11-15 15:06:06 UTC
adding URL for original image:
http://upload.wikimedia.org/wikipedia/commons/0/01/AaatestSonnepalmenstrand90.jpg
Comment 2 Antoine "hashar" Musso (WMF) 2011-11-15 15:20:25 UTC
TIP: Under unix, you can use LC_MESSAGES="C" before your command to have the output using the default C language (aka english).

We probably are lacking some CPU power for generating thumbnails. Is this an issue with those two images or is that a general issue?

Please note we have a project to replace ImageMagick (software used to render thumbnails) by VIPS which will give us better performances. See:
http://www.gossamer-threads.com/lists/wiki/wikitech/257118
and bug 28135
Comment 3 Saibo 2011-11-15 16:07:24 UTC
(In reply to comment #2)
> TIP: Under unix, you can use LC_MESSAGES="C" before your command to have the
> output using the default C language (aka english).
off topic: Thanks, will try to remember :-)
 
> We probably are lacking some CPU power for generating thumbnails. Is this an
> issue with those two images or is that a general issue?
General issue. Those were just to demonstrate.

I would expect no longer rendering times than about 3 seconds. That is really not a computational hard task. Raw processing time is probably fractions of a second. 

It is like this since weeks. I did not notice this before the 1.18 update.
Comment 4 Roan Kattouw 2011-11-16 10:46:29 UTC
It's not necessarily the thumbnail *generation* that's slow. The thumbnail *storage* server (ms5) is also known to be slow, and it wouldn't be hit in normal operation because the image would be cached by the upload Squids.

So when generating a new thumb you're hitting both the scalers and ms5, and when requesting a cached thumb you're hitting neither. That makes it hard to tell which one is the culprit from your timing data alone.
Comment 5 Saibo 2011-11-16 12:30:14 UTC
(In reply to comment #4)
>That makes it hard to
> tell which one is the culprit from your timing data alone.

Can't you reproduce it?
Comment 6 TMg 2011-12-20 04:04:37 UTC
Should not be to hard to reproduce this. Currently it happens all the time for all new images or simply when creating a gallery in a size that was never rendered before. For example, go to http://toolserver.org/~daniel/WikiSense/Gallery.php?wikifam=commons.wikimedia.org and switch to 250 images per page.

I know this is not very helpful for you. What can we do to track this down? I think it's a major issue especially in conjunction with old thumbnails not updating, see bug 31680 and others.
Comment 7 Alexander Karnstedt 2011-12-22 17:17:15 UTC
Especially list articles with many images suffer from this problem. Sometimes it takes ages to load such a page with dozens of thumbnails. After creating all the thumbnails, the performance is OK for a few hours. But the next day the kerfuffle starts again.

Are the thumbnails being purched in between?
Comment 8 Mark A. Hershberger 2011-12-22 23:51:00 UTC
Bumping to highest, tagging platformeng to get someone to look at this ASAP.
Comment 9 Rob Lanphier 2012-01-03 22:35:53 UTC
I imagine this is going to be rendered obsolete by the SwiftMedia deployment and by VIPS when we get there, but there might be a short term problem.  Putting on the "ops" keyword in case there's something that can be tuned in the short-term.
Comment 10 Tim Starling 2012-01-03 23:45:37 UTC
Marking invalid. The supplied test case works for me. 

Yes, it's possible that at some point in the past, thumbnails were slow to render, but urgent site issues should be discussed on IRC while they're actually happening, not on Bugzilla 6 weeks after the fact.

Maybe at some point in the future we will have sufficient historical service time data to let us confirm or maybe even diagnose this sort of thing after the fact. But at the moment we are mostly limited to real time analysis.

(In reply to comment #6)
> Should not be to hard to reproduce this. Currently it happens all the time for
> all new images or simply when creating a gallery in a size that was never
> rendered before. For example, go to
> http://toolserver.org/~daniel/WikiSense/Gallery.php?wikifam=commons.wikimedia.org
> and switch to 250 images per page.

The browser should limit concurrency to about 5, so say if it takes 2 seconds per image, then that would be 1 minute 40 to load all images. Is that what you see?

We only have 60 threads on 48 cores for the whole image scaling cluster, so obviously if you want 250 images at a time, it's going to take a while, even with unlimited client-side concurrency. If you don't want it to be slow, then don't do that.

Maybe ms5 will hit an I/O limit before we get to 60 threads, but it's clear from the profiling data that ms5 is not permanently in such a state. The average thumb.php service time measured by MediaWiki is only 156ms.
Comment 11 Saibo 2012-01-04 03:53:25 UTC
(In reply to comment #10)
> Marking invalid. The supplied test case works for me. 
> 
> Yes, it's possible that at some point in the past, thumbnails were slow to
> render, but urgent site issues should be discussed on IRC while they're
> actually happening, not on Bugzilla 6 weeks after the fact.

One time there is no one on IRC who responds in wikimedia-tech, next time I get told to file it to bugzilla...  *grr*  Don't tell my bug report is INVALID!


> 
> Maybe at some point in the future we will have sufficient historical service
> time data to let us confirm or maybe even diagnose this sort of thing after the
> fact. But at the moment we are mostly limited to real time analysis.
> 
> (In reply to comment #6)
> > Should not be to hard to reproduce this. Currently it happens all the time for
> > all new images or simply when creating a gallery in a size that was never
> > rendered before. For example, go to
> > http://toolserver.org/~daniel/WikiSense/Gallery.php?wikifam=commons.wikimedia.org
> > and switch to 250 images per page.
> 
> The browser should limit concurrency to about 5, so say if it takes 2 seconds
> per image, then that would be 1 minute 40 to load all images. Is that what you
> see?
> 
> We only have 60 threads on 48 cores for the whole image scaling cluster, so
> obviously if you want 250 images at a time, it's going to take a while, even
> with unlimited client-side concurrency. If you don't want it to be slow, then
> don't do that.
> 
> Maybe ms5 will hit an I/O limit before we get to 60 threads, but it's clear
> from the profiling data that ms5 is not permanently in such a state. The
> average thumb.php service time measured by MediaWiki is only 156ms.

I saw it also for a single image as I have clearly written above.. maybe your reading ability is INVALID?!  

Testcase 1 still is like I reported it. Maybe you should try from a German IP.
Comment 12 Tim Starling 2012-01-04 07:58:46 UTC
Thanks for your report on IRC. I was able to track down the problem. It turned out to be the configuration of a proxy server in the Amsterdam cluster. It didn't have anything to do with thumbnail rendering or MediaWiki. Instead, nginx was waiting for a connection timeout.
Comment 13 Saibo 2012-01-04 14:09:17 UTC
(In reply to comment #12)
> Thanks for your report on IRC. I was able to track down the problem. It turned
> out to be the configuration of a proxy server in the Amsterdam cluster. It
> didn't have anything to do with thumbnail rendering or MediaWiki. Instead,
> nginx was waiting for a connection timeout.

Confirm. Seems to be fixed. Thanks.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links