Last modified: 2011-11-15 14:58:53 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T32086, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 30086 - Upload problems : Slow / timeouts
Upload problems : Slow / timeouts
Status: RESOLVED FIXED
Product: Wikimedia
Classification: Unclassified
General/Unknown (Other open bugs)
unspecified
All All
: Highest critical with 1 vote (vote)
: ---
Assigned To: Sam Reed (reedy)
: ops
Depends on:
Blocks: 30027
  Show dependency treegraph
 
Reported: 2011-07-27 18:47 UTC by Maarten Dammers
Modified: 2011-11-15 14:58 UTC (History)
16 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Maarten Dammers 2011-07-27 18:47:11 UTC
I've been having some upload problems recently. I've tested this in different situations
* Upload Wizard upload is very slow (see also #30027)
* Uploads using a bot from my home connection using the api will time out with larger files
* Uploads using a bot from the Toolserver using the api will time out with larger files
* Uploads from my (fast) work connection using classic special:upload and using the api will time out with larger files
Comment 1 Sam Reed (reedy) 2011-07-27 18:53:02 UTC
*** Bug 30027 has been marked as a duplicate of this bug. ***
Comment 2 Brion Vibber 2011-07-28 23:33:28 UTC
Do you have some sample files and sample code to upload them that regularly reproduces the timeouts?
Comment 3 mattwj2002 2011-07-31 21:21:18 UTC
It has been very slow uploading to the Wikimedia Commons on both the basic upload and the regular upload.
 
In addition, I have been having problems with uploading using upload.py  I have  the newest version of the subversion.
 
It eventually uploads, but an upload that should take minutes is taking hours.
 
I have a 22 Mbps / 7 Mbps connection.
 
Here is a log of an example upload:
 
http://pastebin.com/A5Upvr31
 
Please fix this as soon as possible.  Some people probably are not uploading because it is so slow.  I think this issue is very important.
Comment 4 mattwj2002 2011-07-31 21:37:18 UTC
Correction, the pastebin should be the following:

http://pastebin.com/A5Upvr31

Sorry!
Comment 5 mattwj2002 2011-07-31 21:40:52 UTC
Bugzilla is having issues.  When I post the link it is changing it.  I am trying it with a space.

http://pastebin.com/ A5Upvr31
Comment 6 Mark A. Hershberger 2011-08-01 13:31:40 UTC
link problem at Bug 30161
Comment 7 Brion Vibber 2011-08-03 14:20:19 UTC
I'm definitely seeing a verrrry slow upload of a 78mb Ogg file to Commons, though I can't be sure whether it's the server end or the Wikimania network.

It seems to be spiking up briefly, then halting for a while, which could be an indication of lost packets delaying the upload stream as it waits to time out.

Peaks are 60-130 KB/sec, but ongoing rates are often ..... 6, 12, 25.
Comment 8 prolineserver 2011-08-03 20:55:31 UTC
I cannot upload files ~2-3MB from the toolserver eighter:
Uploading file to commons:commons via API....
<urlopen error timed out>
WARNING: Could not open 'http://commons.wikimedia.org/w/api.php'. Maybe the server or
 your connection is down. Retrying in 1 minutes...
<urlopen error timed out>
WARNING: Could not open 'http://commons.wikimedia.org/w/api.php'. Maybe the server or
 your connection is down. Retrying in 2 minutes...
<urlopen error timed out>
WARNING: Could not open 'http://commons.wikimedia.org/w/api.php'. Maybe the server or
 your connection is down. Retrying in 4 minutes...
Comment 9 Ra Boe 2011-08-05 07:07:11 UTC
in the Commonist 0.4.17  "unexpected response: HTTP/1.0 502 Bad Bateway)
Comment 10 Sumana Harihareswara 2011-08-05 10:51:24 UTC
Just heard another report of this from Martina Nolte, who ten minutes ago tried again to upload via Commonist: "Commonist now starts to upload a second image and then fails with "HTTP/1.0 502 Bad Gateway"."
Comment 11 Martina 2011-08-05 12:28:52 UTC
(In reply to comment #7)
> I can't be sure whether it's the server end or the Wikimania network.

It's not a Wikimania problem. "Homies" have the same bug since 3. Aug.: 
http://commons.wikimedia.org/wiki/Commons_talk:Tools/Commonist#Upload_problem
http://commons.wikimedia.org/wiki/Commons:Forum#unexpected_response:_HTTP.2F1.0_502_Bad_Gateway
Comment 12 Sumana Harihareswara 2011-08-05 13:34:38 UTC
Ryan Lane just mentioned to me that this seems like a problem with the Java app (Commonist?); any issues on the Wikimedia side seem fixed.
Comment 13 Sam Reed (reedy) 2011-08-05 13:36:10 UTC
(In reply to comment #12)
> Ryan Lane just mentioned to me that this seems like a problem with the Java app
> (Commonist?); any issues on the Wikimedia side seem fixed.

Mark fixed an issue with one of the API apaches being incorrectly configured earlier today. Waiting to see if that fixes the 502 issues we've been seeing
Comment 14 prolineserver 2011-08-05 13:41:19 UTC
Still not working with pywikipedia from the toolserver:

Uploading file to commons:commons via API....
HTTPError: 502 Bad Gateway
WARNING: Could not open 'http://commons.wikimedia.org/w/api.php'.
Maybe the server is down. Retrying in 1 minutes...
Comment 15 Sam Reed (reedy) 2011-08-05 13:44:44 UTC
HTTPError: 502 Bad Gateway
WARNING: Could not open 'http://commons.wikimedia.org/w/api.php'.
Comment 16 Neil Kandalgaonkar 2011-08-05 16:16:28 UTC
For everyone following this bug -- the 502/504 error issue is being separately tracked in #30201.
Comment 17 Sam Reed (reedy) 2011-08-05 21:12:27 UTC
The API specific errors have been fixed, I wonder if this has any benefit on the upload issues..
Comment 18 Martina 2011-08-06 10:46:49 UTC
Commonist upload runs perfectly now. Thanks to all who helped!
Comment 19 inductiveload 2011-08-07 01:48:55 UTC
Upload though the upload form is still pretty slow: 16 minutes for a 40MB file (i.e. average speed of 42kBps). My connection is 7+ Mbps upload, (tested just after uploading), and the Internet Archive uploads are as fast as I expected (about a minute), so it must be a Commons-related problem.

I have heard, but not yet checked myself, that pywikipedia has upload token issues too.
Comment 20 Maarten Dammers 2011-08-08 09:29:38 UTC
No guys. This is not resolved at all. I guess we have two problems giving the same result: Upload problems
1. Bugged proxy giving 502's (solved in #30201)
2. General slowness of upload

Just take a look at the timeline at https://secure.wikimedia.org/wikipedia/commons/w/index.php?title=Special:ListFiles&user=BotMultichillT to see how slow it is.
These are pictures uploaded from the toolserver.
Comment 21 Mark A. Hershberger 2011-08-08 19:21:30 UTC
> Just take a look at the timeline at
> https://secure.wikimedia.org/wikipedia/commons/w/index.php?title=Special:ListFiles&user=BotMultichillT
> to see how slow it is.
> These are pictures uploaded from the toolserver.

I don't know anything about that bot, but, using the API, I charted
the time between uploads against the size of the uploads (the closest
approximation I could think of for speed).  I did notice a little
slowdown yesterday but it seems to be back now.

The timeline, AFAICT, does not support your assertion that something
is still unresolved.

Feel free to repen and let me know what I should look for in the timeline of
that bot if you feel like there is still a problem.
Comment 22 Neil Kandalgaonkar 2011-08-08 19:36:20 UTC
Mark: how far back does your chart go? Reedy believes this started to become an issue around July 23rd.

I'm more inclined to believe this is a real issue -- we're hearing about this from lots of people. It might be localized to Europe, like the last problems were.
Comment 23 Mark A. Hershberger 2011-08-08 19:38:15 UTC
I'll post a chart for as far back as I can once I've generated it.
Comment 24 Maarten Dammers 2011-08-09 00:59:25 UTC
I'm a user. I have a problem. I open an incident. If the user confirms it, you'll close the incident, don't just close it *twice* because you think it's solved.

Commons upload is slow as hell, so yes, this is still an issue. So please, before you close this incident again: Verify with the user who reported this if it's really solved.
Comment 25 Mark A. Hershberger 2011-08-09 02:12:58 UTC
In reply to comment #22)
> Mark: how far back does your chart go? Reedy believes this started to become an
> issue around July 23rd.

I have data going back to March, now.


(In reply to comment #24)
> Commons upload is slow as hell, so yes, this is still an issue.

I'm just trying to get some numbers to back up these data-less assertions.  I know people don't usually keep numbers like this handy, so I'm sympathetic to what you're saying.  However, objective numbers are more reliable than user reports of "slowness".  I'll work with NeilK to get some.
Comment 26 prolineserver 2011-08-09 11:55:37 UTC
I still get the following error:
Uploading file to commons:commons via API....
<urlopen error timed out>
WARNING: Could not open 'http://commons.wikimedia.org/w/api.php'. Maybe the
server or
 your connection is down. Retrying in 1 minutes...
<urlopen error timed out>
WARNING: Could not open 'http://commons.wikimedia.org/w/api.php'. Maybe the
server or
 your connection is down. Retrying in 2 minutes...
<urlopen error timed out>
WARNING: Could not open 'http://commons.wikimedia.org/w/api.php'. Maybe the
server or
 your connection is down. Retrying in 4 minutes...
<urlopen error timed out>
WARNING: Could not open 'http://commons.wikimedia.org/w/api.php'. Maybe the
server or
 your connection is down. Retrying in 8 minutes...
<urlopen error timed out>
WARNING: Could not open 'http://commons.wikimedia.org/w/api.php'. Maybe the
server or
 your connection is down. Retrying in 16 minutes...
Comment 27 Mark A. Hershberger 2011-08-09 16:47:12 UTC
(In reply to comment #25)
> I have data going back to March, now.

And now, back to 2009 for BotMultichillT.  I've posted the raw data at http://mah.everybody.org/chart.zip (8mb).  I have also asked a researcher if she could help with visualizing the data.

There are problems with it, so I'm going to see if clean it up some. I also saw problems with the API while generating the report.
Comment 28 Maarten Dammers 2011-08-09 18:40:03 UTC
I did some test using my office pc (very fast uplink). I downloaded two +/- 10MB files from http://www.openbeelden.nl/ . That took about 1 second.

Uploading a file through the upload wizard took about 6 minutes = 30 KB/sec 
Uploading a file through the old upload seems to take about the same amount of time.

I'm coming from Europe (AS1103 to be exact). I wonder if someone from the USA could do the same test (download a file from http://www.openbeelden.nl/ and upload it to Commons and time it) to see if the problem might be location related.
Comment 29 Sam Reed (reedy) 2011-08-09 18:49:32 UTC
(In reply to comment #28)
> 
> I'm coming from Europe (AS1103 to be exact). I wonder if someone from the USA
> could do the same test (download a file from http://www.openbeelden.nl/ and
> upload it to Commons and time it) to see if the problem might be location
> related.

I did note that before somewhere, possibly another bug, all the reporters seemed EU based, but it wasn't necessarily a complete survey
Comment 30 inductiveload 2011-08-09 19:33:18 UTC
(In reply to comment #27)

> And now, back to 2009 for BotMultichillT.  

A simple graph of the data from 2008-2011 with a 1000-element moving average can be see at http://commons.wikimedia.org/wiki/File:Commons_upload_speeds_2008-2011.png

A graph of the data from 2011 with a 100-element moving average can be seen at http://commons.wikimedia.org/wiki/File:Commons_upload_speeds_2011.png

The moving average is very buggy and a couple of very fast outliers distort it badly, but a dramatic reduction can be seen firstly in January this year and again in July.

If it helps, I am based in the UK, but I have heard about this problem from American editors too.
Comment 31 Neil Kandalgaonkar 2011-08-09 20:40:42 UTC
Just tried it from the USA. Uploading an 8.4MB file to Commons took about 5 minutes 20 seconds (320 seconds). So that should be about 26Kb/sec upload.

Download (from http://www.openbeelden.nl/) was much faster, and took about 30 seconds.

Re: the graph --  the reduction in upload speed might coincide with how we introduced UploadWizard. It may be that the API method has always been slower. That would not explain why the upload speed seems to be dramatically slowing down in recent months, since we haven't altered anything about the upload protocol recently.

Multichill -- I would like to see the same graph with outliers removed, if you please?
Comment 32 Mark A. Hershberger 2011-08-10 02:38:16 UTC
(In reply to comment #31)
> Just tried it from the USA. Uploading an 8.4MB file to Commons took about 5
> minutes 20 seconds (320 seconds). So that should be about 26Kb/sec upload.
> 
> Download (from http://www.openbeelden.nl/) was much faster, and took about 30
> seconds.

Upload vs download is, of course, not the same and depends on you provider.
Comment 33 Maarten Dammers 2011-08-10 20:54:40 UTC
We did some debugging last night. The chain when uploading is:

Me -> Europe squid -> US squid -> application server (apache) -> NFS -> ms7

Multiple people on different continents have this problem so it's probably not the Europe squids.
NFS copy from the apache to the nfs share on ms7 is fast so that doesn't seem to be the bottleneck either.
Upload to http://test.wikipedia.org is fast, but upload to secure test is very slow (even slower than Commons).
Unsecure test uses different apaches than secure test or secure/unsecure Commons.

Could an operations person please look into this? Bumping this do highest because Commons is becoming unusable. Lot's of reports are coming in
Comment 34 Roan Kattouw 2011-08-10 20:56:46 UTC
(In reply to comment #33)
> Upload to http://test.wikipedia.org is fast, but upload to secure test is very
> slow (even slower than Commons).
Slower than Commons, really? How does it compare to Commons via secure? I just wanna know whether we really are on to something here or whether we're just noticing a 'tax' being added by the secure gateway.
Comment 35 Ryan Kaldari 2011-08-10 23:02:42 UTC
I just tried uploading a 3.06MB file from the Toolserver to Commons via the API. It took a little over 2 minutes, so roughly equivalent to the speed Neil was reporting.
Comment 36 Juan Sebastian Quintero Santacruz 2011-08-10 23:23:23 UTC
I'm from colombia and I've the same problem.
Comment 37 Chad H. 2011-08-11 01:21:16 UTC
We realize this issue is affecting many users and we're looking into various causes of the problem. If people could avoid "me too" style comments that would help keep the signal:noise ratio down.
Comment 38 Maarten Dammers 2011-08-11 05:46:51 UTC
Chad: That's what you get if an problem like this stays open for more than two weeks.

Who from the operations team is actually working on getting this fixed right now? The bug is assigned to Sam and AFAIK he's not on it.
Comment 39 Chad H. 2011-08-11 12:16:04 UTC
(In reply to comment #38)
> Chad: That's what you get if an problem like this stays open for more than two
> weeks.
> 

I understand the problem is frustrating, but +1s don't help :)

> Who from the operations team is actually working on getting this fixed right
> now? The bug is assigned to Sam and AFAIK he's not on it.
>

I was working with Roan, Sam, RobLa, and Asher last night on this (so there's 4 people plus me). We added some additional profiling late last night that today should give us some more insights.
Comment 40 Étienne Beaulé 2011-08-12 17:04:36 UTC
Commons Helper is going so slow because of commons.  Ebe123
Comment 41 Étienne Beaulé 2011-08-12 17:09:36 UTC
Commons Helper is going so slow because of commons.  Ebe123(In reply to comment #22)
> Mark: how far back does your chart go? Reedy believes this started to become an
> issue around July 23rd.
> 
> I'm more inclined to believe this is a real issue -- we're hearing about this
> from lots of people. It might be localized to Europe, like the last problems
> were.

I'm in Halifax, and it is taking forever.  Ebe123
Comment 42 Sam Reed (reedy) 2011-08-12 17:56:01 UTC
(In reply to comment #41)
> Commons Helper is going so slow because of commons.  Ebe123(In reply to comment
> #22)
> > Mark: how far back does your chart go? Reedy believes this started to become an
> > issue around July 23rd.
> > 
> > I'm more inclined to believe this is a real issue -- we're hearing about this
> > from lots of people. It might be localized to Europe, like the last problems
> > were.
> 
> I'm in Halifax, and it is taking forever.  Ebe123

Halifax, Yorkshire or Halifax Canada?

Also, this is still an issue for you? Operations believe this should be fixed (and was in their tests) as of a couple of hours ago
Comment 43 Ryan Kaldari 2011-08-12 23:18:50 UTC
Uploading speed via the API seems to be about 3 times faster now. It would be nice if a baseline speed were defined which could be tested against (or a range of speeds), so that we don't have to rely on people deciding that uploading is just "too slow" and filing a bug before anyone takes notice.
Comment 44 Maarten Dammers 2011-08-13 10:03:01 UTC
Over the last couple of days several people turned around the whole cluster trying to pinpoint the bottleneck. Squid were ruled out, ms7 and nfs was ruled out. It ended up being a low level problem:

[11:57]	mark	it was a nasty problem with TSO/GRO being broken with linux 802.1q tagged interfaces
[11:57]	multichill	So really low level problem?
[11:58]	mark	yeah
[11:58]	mark	so, the nic on lvs4 was reassembling tcp packets into jumbo packets before presenting them to the OS
[11:58]	mark	after which LVS would forward them
[11:58]	mark	and then they wouldn't be split back up again by the nic after sending out
[11:58]	multichill	And fragmentation?
[11:58]	mark	and dropped as jumbo packets
[11:58]	mark	so, tcp delays, icmp "frag needed" messages being sent
[11:58]	mark	really hard to see because on the wire, they were < 1500 byte packages as usual
[12:00]	mark	the fix was disabling GRO on all lvs servers
[12:00]	mark	no idea why it was on by default anyway, on most servers it isn't
[12:00]	mark	probably some nic drivers enable it, most don't
[12:01]	mark	i bet TSO wasn't happening because of the added 802.1q vlan tag

Thanks everyone for debugging this problem. I confirmed on Commons that upload is fast again (17MB file uploaded in less than 10 seconds).

Closing this bug as resolved.
Comment 45 Étienne Beaulé 2011-08-14 14:15:14 UTC
(In reply to comment #42)
> (In reply to comment #41)
> > Commons Helper is going so slow because of commons.  Ebe123(In reply to comment
> > #22)
> > > Mark: how far back does your chart go? Reedy believes this started to become an
> > > issue around July 23rd.
> > > 
> > > I'm more inclined to believe this is a real issue -- we're hearing about this
> > > from lots of people. It might be localized to Europe, like the last problems
> > > were.
> > 
> > I'm in Halifax, and it is taking forever.  Ebe123
> 
> Halifax, Yorkshire or Halifax Canada?
> 
> Also, this is still an issue for you? Operations believe this should be fixed
> (and was in their tests) as of a couple of hours ago

Canada, the capital of nova scotia.
Comment 46 Étienne Beaulé 2011-08-14 14:16:06 UTC
(In reply to comment #42)
> (In reply to comment #41)
> > Commons Helper is going so slow because of commons.  Ebe123(In reply to comment
> > #22)
> > > Mark: how far back does your chart go? Reedy believes this started to become an
> > > issue around July 23rd.
> > > 
> > > I'm more inclined to believe this is a real issue -- we're hearing about this
> > > from lots of people. It might be localized to Europe, like the last problems
> > > were.
> > 
> > I'm in Halifax, and it is taking forever.  Ebe123
> 
> Halifax, Yorkshire or Halifax Canada?
> 
> Also, this is still an issue for you? Operations believe this should be fixed
> (and was in their tests) as of a couple of hours ago

Canada, the capital of nova scotia.  Its still an issue.
Comment 47 Roan Kattouw 2011-08-14 14:22:12 UTC
(In reply to comment #46)
> Canada, the capital of nova scotia.  Its still an issue.
So how slow are uploads for you?
Comment 48 Maarten Dammers 2011-08-14 21:29:42 UTC
I doubt this is server side. See for example how fast https://secure.wikimedia.org/wikipedia/commons/w/index.php?title=Special:ListFiles&user=US+National+Archives+bot is going.

Etienne: How as is you internet connection (up and download)? What file size are you trying to upload and how long did this take?

Etienne: You are https://secure.wikimedia.org/wikipedia/commons/w/index.php?title=Special:ListFiles&user=Ebe123 right? What tool do you use for that? Maybe the tool is just slow (I know commonshelper can be very slow).....
Comment 49 Smallman 2011-11-02 20:13:54 UTC
Uploads to https://commons.wikimedia.org/w/api.php timeout (response times out) whereas uploads to https://secure.wikimedia.org/wikipedia/commons/w/api.php work fine.
Comment 50 Neil Kandalgaonkar 2011-11-02 21:31:38 UTC
Smallman: other people are using the API successfully... I think that issue has to be either transient or local to your own situation. 

We can't just keep reopening the same bug any time somebody has a network issue connecting to Commons.
Comment 51 Neil Kandalgaonkar 2011-11-02 21:34:29 UTC
I just want to clarify: I'm not saying your problem isn't real. I'm saying that we can't keep abusing Bugzilla so that we keep reopening the same bug for any and all network issues.

Please document your issue in a way we can replicate. Your issue seems to be some asymmetry between secure.wikimedia.org and https://commons, which might be a problem, but it's not THIS problem.
Comment 52 Mark A. Hershberger 2011-11-04 16:30:01 UTC
(In reply to comment #49)
> Uploads to https://commons.wikimedia.org/w/api.php timeout (response times out)
> whereas uploads to https://secure.wikimedia.org/wikipedia/commons/w/api.php
> work fine.

That is a different problem than the one described here.  Please open a new bug.
Comment 53 Maarten Dammers 2011-11-15 14:58:53 UTC
Resolved invalid? Don't think so. This was fixed back in August.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links