Last modified: 2014-08-14 06:11:23 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T67179, the corresponding Phabricator task for complete and up-to-date bug report information.

Bug 65179 - Random 503 Service Temporarily Unavailable errors from tools-webproxy


Summary:	Random 503 Service Temporarily Unavailable errors from tools-webproxy

Status:	RESOLVED FIXED

Product:	Wikimedia Labs
Classification:	Unclassified
Component:	tools (Other open bugs)
Version:	unspecified
Hardware:	All All

Importance:	Unprioritized major
Target Milestone:	---
Assigned To:	Yuvi Panda

URL:
Whiteboard:
Keywords:

Duplicates:	65272 (view as bug list)
Depends on:
Blocks:
	Show dependency tree / graph

Reported:	2014-05-11 08:35 UTC by metatron
Modified:	2014-08-14 06:11 UTC (History)
CC List:	11 users (show)

See Also:	65272
Web browser:	---
Mobile Platform:	---
Assignee Huggle Beta Tester:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Description metatron 2014-05-11 08:35:07 UTC

Since yesterday, some new errors randomly occur:

503 Service Temporarily Unavailable
followed by
The connection was reset

After reloading the page 2-4x, everything is back to normal.

So, none of the old friends: 404/OOM, 500 errors from lighttpd (works), but new ones. Looks like something from tools-webproxy.

Comment 1 MZMcBride 2014-05-11 08:37:18 UTC

I think I'm hitting the same issue:

$ curl -I "http://tools.wmflabs.org/mzmcbride/"
curl: (52) Empty reply from server


$ curl -I "http://tools.wmflabs.org/mzmcbride/"
HTTP/1.1 503 Service Temporarily Unavailable
Server: nginx/1.7.0
Date: Sun, 11 May 2014 08:36:09 GMT
Content-Type: text/html
Connection: keep-alive
X-Powered-By: PHP/5.3.10-1ubuntu3.10+wmf1

Comment 2 metatron 2014-05-11 13:41:25 UTC

+ a new one
Error code: ERR_SSL_PROTOCOL_ERROR

also disappears after some reloadings.

Comment 3 metatron 2014-05-11 14:03:45 UTC

Since there's a coincidence with the gzip modification yesterday, how about removing that patch?

Comment 4 Yuvi Panda 2014-05-11 15:12:15 UTC

We also upgraded nginx yesterday, so that might also be the reason.

Comment 5 Merlijn van Deen (test) 2014-05-13 19:23:23 UTC

There are also issues with Pywikibot's nightlies stopping transfer after ~50kB. Might be related, but there are no 500's involved.

https://bugzilla.wikimedia.org/65272

Comment 6 Yuvi Panda 2014-05-13 19:40:11 UTC

Is this still happening? I rolled back the nginx change right after making that comment (and mentioned on IRC, but didn't get time to response here - sorry about that), so if it is 'gone' it is just an nginx newer version issue.

Comment 7 Merlijn van Deen (test) 2014-05-13 20:21:35 UTC

It's definitely happening right now:

503 Service Temporarily Unavailable

nginx/1.5.0

Comment 8 metatron 2014-05-13 20:25:27 UTC

Here's a feedback:
After the rollback everything went back to usual (not normal). Right now 503 on all channels.

Comment 9 Morten Wang 2014-05-13 20:27:06 UTC

Same problem, and I see no entries in the access.log or error.log.  lighttpd appears to be running, thus it seems HTTP requests don't make it past the proxy.

Comment 10 Merlijn van Deen (test) 2014-05-13 20:59:33 UTC

*** Bug 65272 has been marked as a duplicate of this bug. ***

Comment 11 Gerrit Notification Bot 2014-05-13 21:18:51 UTC

Change 133172 had a related patch set uploaded by Yuvipanda:
dynamicproxy: Use redis connection pooling

https://gerrit.wikimedia.org/r/133172

Comment 12 Tim Landscheidt 2014-05-14 00:09:05 UTC

Yuvi currently has no power for his laptop, but he commented on IRC:

| <yuvipanda_> mutante: and then I looked at the logs and the problem was that
|              there were just too many connections hanging around, since redis
|              is single threaded but nginx has multiple workers and I had set a
|              1s connection timeout but not set a connection pool
| <yuvipanda_> mutante: so now I've a connection pool with 32s timeouts for
|              purging from the pool plus a 128 max connections limit, which
|              should work  [22:46]
| <yuvipanda_> scfc_de: can you comment on the bug saying this was the problem
|              and the solution is to restart redis on tools-webproxy, for now
|              at least? I don't have my primary machine with me now  [22:47]

Comment 13 Gerrit Notification Bot 2014-05-14 15:45:27 UTC

Change 133172 merged by Andrew Bogott:
dynamicproxy: Use redis connection pooling

https://gerrit.wikimedia.org/r/133172

Wikimedia Bugzilla is closed!

Search

Personal tools

Navigation

Links