Last modified: 2014-08-14 06:11:23 UTC
Since yesterday, some new errors randomly occur: 503 Service Temporarily Unavailable followed by The connection was reset After reloading the page 2-4x, everything is back to normal. So, none of the old friends: 404/OOM, 500 errors from lighttpd (works), but new ones. Looks like something from tools-webproxy.
I think I'm hitting the same issue: $ curl -I "http://tools.wmflabs.org/mzmcbride/" curl: (52) Empty reply from server $ curl -I "http://tools.wmflabs.org/mzmcbride/" HTTP/1.1 503 Service Temporarily Unavailable Server: nginx/1.7.0 Date: Sun, 11 May 2014 08:36:09 GMT Content-Type: text/html Connection: keep-alive X-Powered-By: PHP/5.3.10-1ubuntu3.10+wmf1
+ a new one Error code: ERR_SSL_PROTOCOL_ERROR also disappears after some reloadings.
Since there's a coincidence with the gzip modification yesterday, how about removing that patch?
We also upgraded nginx yesterday, so that might also be the reason.
There are also issues with Pywikibot's nightlies stopping transfer after ~50kB. Might be related, but there are no 500's involved. https://bugzilla.wikimedia.org/65272
Is this still happening? I rolled back the nginx change right after making that comment (and mentioned on IRC, but didn't get time to response here - sorry about that), so if it is 'gone' it is just an nginx newer version issue.
It's definitely happening right now: 503 Service Temporarily Unavailable nginx/1.5.0
Here's a feedback: After the rollback everything went back to usual (not normal). Right now 503 on all channels.
Same problem, and I see no entries in the access.log or error.log. lighttpd appears to be running, thus it seems HTTP requests don't make it past the proxy.
*** Bug 65272 has been marked as a duplicate of this bug. ***
Change 133172 had a related patch set uploaded by Yuvipanda: dynamicproxy: Use redis connection pooling https://gerrit.wikimedia.org/r/133172
Yuvi currently has no power for his laptop, but he commented on IRC: | <yuvipanda_> mutante: and then I looked at the logs and the problem was that | there were just too many connections hanging around, since redis | is single threaded but nginx has multiple workers and I had set a | 1s connection timeout but not set a connection pool | <yuvipanda_> mutante: so now I've a connection pool with 32s timeouts for | purging from the pool plus a 128 max connections limit, which | should work [22:46] | <yuvipanda_> scfc_de: can you comment on the bug saying this was the problem | and the solution is to restart redis on tools-webproxy, for now | at least? I don't have my primary machine with me now [22:47]
Change 133172 merged by Andrew Bogott: dynamicproxy: Use redis connection pooling https://gerrit.wikimedia.org/r/133172