Last modified: 2014-08-14 06:11:23 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T67179, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 65179 - Random 503 Service Temporarily Unavailable errors from tools-webproxy
Random 503 Service Temporarily Unavailable errors from tools-webproxy
Status: RESOLVED FIXED
Product: Wikimedia Labs
Classification: Unclassified
tools (Other open bugs)
unspecified
All All
: Unprioritized major
: ---
Assigned To: Yuvi Panda
:
: 65272 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-05-11 08:35 UTC by metatron
Modified: 2014-08-14 06:11 UTC (History)
11 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description metatron 2014-05-11 08:35:07 UTC
Since yesterday, some new errors randomly occur:

503 Service Temporarily Unavailable
followed by
The connection was reset

After reloading the page 2-4x, everything is back to normal.

So, none of the old friends: 404/OOM, 500 errors from lighttpd (works), but new ones. Looks like something from tools-webproxy.
Comment 1 MZMcBride 2014-05-11 08:37:18 UTC
I think I'm hitting the same issue:

$ curl -I "http://tools.wmflabs.org/mzmcbride/"
curl: (52) Empty reply from server


$ curl -I "http://tools.wmflabs.org/mzmcbride/"
HTTP/1.1 503 Service Temporarily Unavailable
Server: nginx/1.7.0
Date: Sun, 11 May 2014 08:36:09 GMT
Content-Type: text/html
Connection: keep-alive
X-Powered-By: PHP/5.3.10-1ubuntu3.10+wmf1
Comment 2 metatron 2014-05-11 13:41:25 UTC
+ a new one
Error code: ERR_SSL_PROTOCOL_ERROR

also disappears after some reloadings.
Comment 3 metatron 2014-05-11 14:03:45 UTC
Since there's a coincidence with the gzip modification yesterday, how about removing that patch?
Comment 4 Yuvi Panda 2014-05-11 15:12:15 UTC
We also upgraded nginx yesterday, so that might also be the reason.
Comment 5 Merlijn van Deen (test) 2014-05-13 19:23:23 UTC
There are also issues with Pywikibot's nightlies stopping transfer after ~50kB. Might be related, but there are no 500's involved.

https://bugzilla.wikimedia.org/65272
Comment 6 Yuvi Panda 2014-05-13 19:40:11 UTC
Is this still happening? I rolled back the nginx change right after making that comment (and mentioned on IRC, but didn't get time to response here - sorry about that), so if it is 'gone' it is just an nginx newer version issue.
Comment 7 Merlijn van Deen (test) 2014-05-13 20:21:35 UTC
It's definitely happening right now:

503 Service Temporarily Unavailable

nginx/1.5.0
Comment 8 metatron 2014-05-13 20:25:27 UTC
Here's a feedback:
After the rollback everything went back to usual (not normal). Right now 503 on all channels.
Comment 9 Morten Wang 2014-05-13 20:27:06 UTC
Same problem, and I see no entries in the access.log or error.log.  lighttpd appears to be running, thus it seems HTTP requests don't make it past the proxy.
Comment 10 Merlijn van Deen (test) 2014-05-13 20:59:33 UTC
*** Bug 65272 has been marked as a duplicate of this bug. ***
Comment 11 Gerrit Notification Bot 2014-05-13 21:18:51 UTC
Change 133172 had a related patch set uploaded by Yuvipanda:
dynamicproxy: Use redis connection pooling

https://gerrit.wikimedia.org/r/133172
Comment 12 Tim Landscheidt 2014-05-14 00:09:05 UTC
Yuvi currently has no power for his laptop, but he commented on IRC:

| <yuvipanda_> mutante: and then I looked at the logs and the problem was that
|              there were just too many connections hanging around, since redis
|              is single threaded but nginx has multiple workers and I had set a
|              1s connection timeout but not set a connection pool
| <yuvipanda_> mutante: so now I've a connection pool with 32s timeouts for
|              purging from the pool plus a 128 max connections limit, which
|              should work  [22:46]
| <yuvipanda_> scfc_de: can you comment on the bug saying this was the problem
|              and the solution is to restart redis on tools-webproxy, for now
|              at least? I don't have my primary machine with me now  [22:47]
Comment 13 Gerrit Notification Bot 2014-05-14 15:45:27 UTC
Change 133172 merged by Andrew Bogott:
dynamicproxy: Use redis connection pooling

https://gerrit.wikimedia.org/r/133172

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links