Last modified: 2013-05-13 17:53:41 UTC
The server http://bits.wikimedia.org/ is insanely slow since two days. Requests almost never return anything. The requests time out instead. This leaves all Mediawiki projects (including Commons) naked without any CSS (except for my user CSS). Maybe an DNS issue? Is there an DoS going on? I'm sure this is not an issue on my side because I tested this on different computers using different internet connections. It's the same everywhere. I'm in Germany. Here is the relevant part of a tracert: C:\>tracert bits.wikimedia.org Routenverfolgung zu bits-lb.esams.wikimedia.org [91.198.174.233]: [...] 8 50 ms 51 ms 52 ms ge0-1-0-cr0.ixf.de.as6908.net [80.81.192.244] 9 58 ms 56 ms 59 ms xe-5-1-0-core0.nknik.nl.as6908.net [62.149.50.42] 10 54 ms 55 ms 54 ms xe-0-0-1.cr2-knams.wikimedia.org [78.41.155.38] 11 57 ms 56 ms 56 ms bits-lb.esams.wikimedia.org [91.198.174.233] Ablaufverfolgung beendet. I can't explain why the tracert looks so good. Requesting any bits URL in the browser almost always times out.
It seems completely down at the moment: Failed to load resource: the server responded with a status of 503 (Service Unavailable)
To let you know: It's much better now but not solved. Currently it feels like 5% of the requests in the German Wikipedia time out. Nothing happens for a minute and a "server does not respond" is shown. When I try again it works most of the time. Some edits are lost because of this. Multiple users reported the same problem.
Raising priority then, adding the 'ops' keyword.
(In reply to comment #2) > Multiple users reported the same problem. URLs welcome, as I haven't seen anything on the usual Commons forums that I try to follow. There was an outage on Wednesday, 13:30 - 14:00 UTC, due to a memcached server going offline. "As usual this caused all kinds of cascading failures on other clusters such as Squid/Varnish. When not overloaded, these clusters would only serve cached pages at that point." That would not cover "since 2 days" but that's what people immediately mentioned when I brought up this bug report in the operations channel. I'm currently also in Germany and I ran "mtr" on my Linux machine for a while: My traceroute [v0.82] embrace.foo (0.0.0.0) Fri May 10 02:59:33 2013 Resolver: Received error response 2. (server failure)er of fields quit Packets Pings Host Loss% Snt Last Avg Best Wrst StDev 1. fritz.box 0.0% 158 1.1 9.1 1.0 606.0 58.0 2. 217.0.117.142 0.0% 158 20.1 30.4 19.1 529.5 48.8 3. 87.186.195.6 0.0% 158 22.3 31.6 20.4 434.4 39.0 4. hh-ea4-i.HH.DE.NET.DTAG.DE 0.6% 158 27.7 41.4 27.0 338.0 35.6 5. 194.25.208.234 0.0% 158 30.2 45.8 27.7 1023. 81.4 80.156.160.242 80.150.168.162 80.156.163.126 6. hbg-bb1-link.telia.net 0.0% 158 27.4 48.4 27.2 999.9 80.3 hbg-bb1-link.telia.net hbg-bb1-link.telia.net 7. adm-bb3-link.telia.net 0.0% 158 33.5 50.3 32.6 1029. 102.1 adm-bb3-link.telia.net adm-bb3-link.telia.net adm-bb3-link.telia.net adm-bb3-link.telia.net adm-bb3-link.telia.net adm-bb3-link.telia.net adm-bb3-link.telia.net 8. adm-b5-link.telia.net 0.0% 158 35.1 51.0 33.9 943.4 92.1 9. wikimedia-ic-129908- adm-b3.c.telia.net 5.7% 158 37.0 49.0 34.6 846.6 68.5 10. bits.esams.wikimedia.org 0.0% 158 35.2 47.8 35.2 746.2 69.9
(In reply to comment #0) > The server http://bits.wikimedia.org/ is insanely slow since two days. > Requests > almost never return anything. The requests time out instead. This leaves all > Mediawiki projects (including Commons) naked without any CSS (except for my > user CSS). > I would note that your user css is served via bits. What urls specifically are timing out, or is it random? >There was an outage on Wednesday, 13:30 - 14:00 UTC, due to a memcached server going offline. Shouldn't these sorts of things show up in the server admin log...
I answered to before to bug 42653 (comments: 14 - 17), but i will write the key points to here too. It seems that bits-lb.esams.wikimedia.org http is broken. IP itself answers to ping and https links are working fine. Eg. this works: - curl -i https://bits.wikimedia.org/ This will fail most of the times - curl -i http://bits.wikimedia.org/ Error is: curl: (7) Failed to connect to 2620:0:862:ed1a::a: Network is unreachable
Some links to discussions: - https://en.wikipedia.org/wiki/Wikipedia:Help_desk#Problems_getting_pages_to_load - http://fi.wikipedia.org/wiki/Wikipedia:Kahvihuone_%28tekniikka%29#Wikipedian_hidastelu
Out of curiosity, does curl -i -4 http://bits.wikimedia.org/ Also give you errors?
Yes
(In reply to comment #9) > Yes To clarify, does it give the same error (it definitely should not)
> To clarify, does it give the same error (it definitely should not) Error message is: curl -i -4 http://bits.wikimedia.org/ curl: (7) Failed connect to bits.wikimedia.org:80; Connection timed out
And when connection works the response is pretty much instant. So it is not like that http server is too slow, but more like it just works or it doesn't work. Example response from http query which worked: - time curl -i -4 http://bits.wikimedia.org/ HTTP/1.1 200 OK Server: Apache Last-Modified: Thu, 12 Aug 2010 16:12:20 GMT ETag: "b2-48da2a1772100" Content-Type: text/html X-Varnish: 1991165982 Via: 1.1 varnish Content-Length: 178 Accept-Ranges: bytes Date: Fri, 10 May 2013 06:05:39 GMT X-Varnish: 3599832084 Age: 0 Via: 1.1 varnish Connection: keep-alive X-Cache: sq67 miss (0), cp3022 miss (0) <html> <head><title>bits and pieces</title> <meta http-equiv="refresh" content="1;url=http://www.wikimedia.org/" /> </head> <body> bits and pieces live here! </body> </html> real 0m0.281s user 0m0.004s sys 0m0.004s
At the moment all Wikimedia projects are kind of dead and unusable because of this. Here are some example URLs that all time out: http://bits.wikimedia.org/de.wikipedia.org/load.php?debug=false&lang=de&modules=startup&only=scripts&skin=vector&* http://bits.wikimedia.org/commons.wikimedia.org/load.php?debug=false&lang=de&modules=startup&only=scripts&skin=vector&* http://bits.wikimedia.org/en.wikipedia.org/load.php?debug=false&lang=en&modules=ext.gadget.ReferenceTooltips%2Ccharinsert%2Ctoolbaralert2%7Cext.wikihiero%7Cmediawiki.legacy.commonPrint%2Cshared%7Cmw.PopUpMediaTransform%7Cskins.vector&only=styles&skin=vector&* Its like comment #12 said. Some requests to bits.wikimedia.org return immediately, some requests take a very long time (about 30 seconds) and some requests never return (time out). Again, I'm sitting in Germany. C:\>tracert bits.wikimedia.org Routenverfolgung zu bits-lb.esams.wikimedia.org [91.198.174.233]: [...] 8 53 ms 51 ms 51 ms ge0-1-0-cr0.ixf.de.as6908.net [80.81.192.244] 9 58 ms 57 ms 57 ms xe-5-1-0-core0.nknik.nl.as6908.net [62.149.50.42] 10 55 ms 55 ms 55 ms xe-0-0-1.cr2-knams.wikimedia.org [78.41.155.38] 11 59 ms 55 ms 57 ms bits-lb.esams.wikimedia.org [91.198.174.233]
Created attachment 12291 [details] Bits and Meta subdomain requests time out Here is a screenshot from the Opera Dragonfly debugger. Please not that it's not only bits.wikimedia.org (all URLs that start with load.php). Also some meta.wikimedia.org URLs time out.
(In reply to comment #6) > https links are working fine. Wow, you are right. The problem is immediately solved when I switch from http to https. I guess this is the reason why most of the users can't reproduce my problem. https://de.wikipedia.org/wiki/Wikipedia:Fragen_zur_Wikipedia#Wikipedia-Server_sterbenslahm
And now both http and https have the same problem and are unusable. Guys, what's going on? Here are some https example URLs that time out: https://bits.wikimedia.org/commons.wikimedia.org/load.php?debug=false&lang=de&modules=startup&only=scripts&skin=vector&* https://bits.wikimedia.org/commons.wikimedia.org/load.php?debug=false&lang=de&modules=site&only=scripts&skin=vector&* https://de.wikipedia.org/ Raising importance.
(In reply to comment #16) > And now both http and https have the same problem and are unusable. That may be related or not. In Italy, for me HTTPS is down since about 20 min ago, while HTTP sometimes loads after a long time (with or without styles).
Another easy personal workaround is to switch to Google DNS server so the bits.wikimedia.org resolves to bits-lb.eqiad.wikimedia.org which works fine. This is one reason why problem is mainly in Europe.
The URLs in comment 13 and comment 16 load fine for me in Firefox 18 (same for using http:// instead of https://), no matter how often I try to reload, and I am based in Germany too currently. I assume you bypass the cache when trying to reload these URLs? http://en.wikipedia.org/wiki/Wikipedia:Bypass_your_cache Summarizing the aforementioned VP/forum threads (thanks for the links!): * https://en.wikipedia.org/wiki/Wikipedia:Help_desk#Problems_getting_pages_to_load states "Now very suddenly working again" today by reporter. * Both https://de.wikipedia.org/wiki/Wikipedia:Fragen_zur_Wikipedia#Wikipedia-Server_sterbenslahm and http://fi.wikipedia.org/wiki/Wikipedia:Kahvihuone_%28tekniikka%29#Wikipedian_hidastelu also imply that only http:// is affected, but the Finnish thread has no new comments since May 8th when the outage happened (see comment 4) so it's unclear if there's still bigger problems. As I don't see indicators yet that this is a problem that a large number of users in Europe is affected by I'll set this back to "highest" priority and "critical".
(In reply to comment #19) > Firefox I'm sure the browser does not matter. I tried both Firefox and Opera. > I assume you bypass the cache when trying to reload these URLs? Yes, I know that and tried everything. In this case bypassing the browser cache made the problem worse. I tried to do the opposite, forcing the browser to never reload these resources if they are in the cache. But it seems there is no setting to do this. As far as I understand the browser always does a HEAD request to check if the cached resources changed. Some of these HEAD requests timed out. Currently everything seems to work. Both http and https. I still think there was an overload, maybe caused by a DoS. We will see if the problem comes back every 24 hours.
We've migrated the network in Europe (esams) to a new topology on Friday (May 10th), which probably also explains why this hasn't been happening since.