Last modified: 2014-08-09 00:19:33 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T68989, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 66989 - stream.wikimedia.org throws websocket.WebSocketException: Handshake Status 502 Bad Gateway
stream.wikimedia.org throws websocket.WebSocketException: Handshake Status 50...
Status: NEW
Product: Wikimedia
Classification: Unclassified
Stream (Other open bugs)
wmf-deployment
All All
: Unprioritized normal (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-06-23 19:01 UTC by Merlijn van Deen (test)
Modified: 2014-08-09 00:19 UTC (History)
8 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Merlijn van Deen (test) 2014-06-23 19:01:28 UTC
Using either
 - the pywikibot page generator implementation,
 - the python implementation on https://wikitech.wikimedia.org/wiki/RCStream, and
 - http://codepen.io/Krinkle/full/laucI/
Comment 1 Merlijn van Deen (test) 2014-07-13 19:44:07 UTC
Note that it works at some point if you're persistent enough in reconnection.
Comment 2 Krinkle 2014-07-13 19:52:38 UTC
http://codepen.io/Krinkle/full/laucI/ seems to work most of the time, but about 1/20 I see the following in the network:

ws://stream.wikimedia.org/socket.io/1/websocket/281487980761
> Error during WebSocket handshake: Unexpected response code: 502

After that it falls back to xhr-polling with loads of paired POST/GET requests.
Comment 3 John Mark Vandenberg 2014-07-13 22:04:59 UTC
If we want to get this up against beta, I have a WIP for that. https://gerrit.wikimedia.org/r/#/c/138312/  Ideas/code welcome for how to allow for beta in our site/family structure.
Comment 4 Ori Livneh 2014-08-09 00:19:33 UTC
Merlijn van Deen offered to look into this with me and we were able to identify the problem: the WebSocket handshake requires two round-trips to the server, and the load balancers were configured to distribute incoming requests across backends in a round-robin fashion. Because the requests that make up the initial handshake follow each other in quick succession, the most common case was for one request to be routed to one server, and the follow-up request to be routed to another server, which had not started negotiating a session with the client and was therefore not expecting the request.

This also explains why it sometimes worked: if another client request intervened between the two requests, you'd get routed to the same server and the handshake would succeed.

Giuseppe and I decided to temporarily "fix" this by simply shutting down one of the servers, causing all requests to get routed to the single remaining server. This made the errors go away, validating the diagnosis. The more permanent fix is to use a different scheduling algorithm to make sessions sticky. This is implemented in <https://gerrit.wikimedia.org/r/#/c/152960/>, which will be deployed in the next few days, most likely.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links