Last modified: 2013-08-14 22:04:56 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T54776, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 52776 - Beta labs getting 503 when you attempt to log in
Beta labs getting 503 when you attempt to log in
Status: RESOLVED FIXED
Product: Wikimedia Labs
Classification: Unclassified
deployment-prep (beta) (Other open bugs)
unspecified
All All
: Highest critical
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-08-13 03:05 UTC by Michelle Grover
Modified: 2013-08-14 22:04 UTC (History)
10 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Michelle Grover 2013-08-13 03:05:25 UTC
Getting 503 errors when I try to log in on mobile or desktop on betalabs.  I checked the logs and I don't any errors related to Mobile frontend but I do see errors in the fatal.log related to MWScript.php and MWMultiVersion.php.
Comment 1 Chris McMahon 2013-08-13 13:55:37 UTC
Yesterday I tried restarting varnish on deployment-cache-text1 and restarting apache on deployment-apache2 with no effect.
Comment 2 Chris McMahon 2013-08-13 14:47:16 UTC
Today restarted memcached on deployment-memc0 and -memc1, no effect.  Nik and I continue to experiment and we've asked Ryan Lane also.
Comment 3 Antoine "hashar" Musso (WMF) 2013-08-13 19:23:11 UTC
503 means the backend service is not reachable. I made a few connections tests and they never reach the Apaches backends so there must be something weird happening at Varnish level (deployment-cache-text1 instance).

I did some simple refreshes in my browser against a random page ( http://en.wikipedia.beta.wmflabs.org/wiki/Dido_Sotiriou ) that gave me the 503 timeout when using a browser though I had no issue getting the page served via curl (which sends no header). 

Using curl I get the page served by varnish text frontend. X-Cache from three requests:

X-Cache: deployment-cache-text1 hit (4), deployment-cache-text1 frontend hit (39)
X-Cache: deployment-cache-text1 hit (4), deployment-cache-text1 frontend hit (40)
X-Cache: deployment-cache-text1 hit (4), deployment-cache-text1 frontend hit (41)


Using a browser I get:

X-Cache:deployment-cache-text1 miss (0), deployment-cache-text1 frontend miss (0)


Some header(s) being send by the browser cause the request to not be cacheable.  That in turns overload the Apache backends which takes a long time to server the request which might lead varnish to serve a 503 whenever the timeout as been reached.
Comment 4 Greg Grossmeier 2013-08-13 19:27:24 UTC
Setting priority/importance as this is blocking testing pretty badly.
Comment 5 Chris McMahon 2013-08-13 23:09:34 UTC
so Varnish is not configured properly?  some examples like these?  https://www.varnish-cache.org/trac/wiki/VCLExamples
Comment 6 Ariel T. Glenn 2013-08-14 08:18:41 UTC
I don't think the apaches overload.  Typically Special pages don't get cached.  On the live site Special:UserLogin is requested live every time from the apaches.  And actually when I request it via curl directly from the apache in labs the response time is quite fast (1 second):

curl -v -H 'Host: en.wikipedia.beta.wmflabs.org' 'http://deployment-apache32/w/index.php?title=Special:UserLogin' > out
> GET /w/index.php?title=Special:UserLogin HTTP/1.1
> User-Agent: curl/7.22.0 (x86_64-pc-linux-gnu) libcurl/7.22.0 OpenSSL/1.0.1 zlib/1.2.3.4 libidn/1.23 librtmp/2.3
> Accept: */*
> Host: en.wikipedia.beta.wmflabs.org
>
  0     0    0     0    0     0      0      0 --:--:--  0:00:01 --:--:--     0< HTTP/1.1 200 OK


Additionally, when I try curl from my laptop:
curl  -v 'http://en.wikipedia.beta.wmflabs.org/w/index.php?title=Special:UserLogin' > out

this times out, rather than giving me a hit:
> GET /w/index.php?title=Special:UserLogin HTTP/1.1
> User-Agent: curl/7.27.0
> Host: en.wikipedia.beta.wmflabs.org
> Accept: */*
> 
  0     0    0     0    0     0      0      0 --:--:--  0:00:27 --:--:--     0< HTTP/1.1 503 Service Unavailable
...
< X-Cache: deployment-cache-text1 miss (0), deployment-cache-text1 frontend miss (0)

Still investigating.
Comment 7 Ariel T. Glenn 2013-08-14 09:04:28 UTC
Turned out to be much simpler.  Apache on deployment-apache33 was not really running (the parent process was alive but nothing else).  Shot and restarted and now login works :-)
Comment 8 Chris McMahon 2013-08-14 14:30:54 UTC
Now we seem to have no js/css though.  I'll look for an issue on deployment-cache-bits03 but I'm not sure what to look for...
Comment 9 Ariel T. Glenn 2013-08-14 14:47:51 UTC
I am getting js and css both at login and on pages I view afterwards (that are slow enough that I'm pretty sure they are rendered and not cached).
Comment 10 Chris McMahon 2013-08-14 14:52:03 UTC
OK, seems to better now, thanks very much!
Comment 11 Antoine "hashar" Musso (WMF) 2013-08-14 21:33:06 UTC
(In reply to comment #7)
> Turned out to be much simpler.  Apache on deployment-apache33 was not really
> running (the parent process was alive but nothing else).  Shot and restarted
> and now login works :-)

Ariel rocks. I thought you about in of the app server not responding but since both had apachen running I did not investigate that much.

We definitely need Icinga monitoring :)
Comment 12 Antoine "hashar" Musso (WMF) 2013-08-14 22:04:56 UTC
I have filled bug 52867 to have the Apache service being monitored.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links