Last modified: 2013-03-12 19:02:54 UTC
We are no longer able to access on EE-Prototype on WMFLabs: http://ee-prototype.wmflabs.org/ Could you help us get it back up? We need the site for a critical AFT5 deployment this week, led by Matthias Mullie. Here's the error message: "Unable to connect. Firefox can't establish a connection to the server at ee-prototype.wmflabs.org." What do you recommend we do to solve this problem quickly? We rely on this site to test all our editor engagement features: Echo, Article Feedback and Page Curation. Thanks for any help you can provide :) Fabrice
Labs issue -> Moving to Labs as a start. "MW Ext > AFTv5" should only refer to issues in your AFTv5 codebase.
Thanks, Andre, much appreciated! Also, for the record, we received this recommendation from Antoine Musso on Mar 12, 2013, at 10:58 AM: -------------------------------------- First step ever: open a new bug report in bugzilla! Second step: hang out in #wikimedia-labs I dont have access to that instance. Some things worth a try: 1) verify whether you can ssh to the instance from labsconsole 2) look at the instance console in labsconsole 3) If those fails reboot the instance from the labsconsole If you ever manage to connect to the instance, make sure Apache is running :-) -------------------------------------- FF: I asked for help on both #wikimedia-labs and #wikimedia-operations, but nobody has responded yet to my question :(
I just attempted to SSH into ee-prototype, but that failed: mlitn@bastion1:~$ ssh ee-prototype ssh: connect to host ee-prototype port 22: No route to host Rebooting the instance through Special:NovaInstance didn't seem to change anything either.
More recommendations via email: On Mar 12, 2013, at 11:42 AM, Ryan Lane wrote: Your instance OOM'd (out of memory). If for some reason you can't connect, you should check the instance's console log (available from wikitech), to see if something has occured to the instance. In this situation you guys can fix the problem with a reboot. I've done this for you. ...and from IRC: "[11:49am] Ryan_Lane: so the kernel started killing processes ... but failed? [11:50am] Ryan_Lane: well, it succeeded, but it killed processes that are needed for it to stay alive [11:50am] Ryan_Lane: the OOM killer isn't exactly smart [11:50am] Ryan_Lane: you can prioritize its killing, of course [11:51am] Ryan_Lane: we should likely have defaults"
Rebooting worked. ee-prototype is on a virtual host with a *lot* of instances, which caused the reboot to take a while.