Last modified: 2013-07-19 21:27:04 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T44127, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 42127 - memcache on labsconsole.wikimedia.org craps out pretty often
memcache on labsconsole.wikimedia.org craps out pretty often
Status: RESOLVED FIXED
Product: Wikimedia Labs
Classification: Unclassified
Infrastructure (Other open bugs)
unspecified
All All
: High normal
: ---
Assigned To: Ryan Lane
:
: 44499 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-11-15 01:28 UTC by MZMcBride
Modified: 2013-07-19 21:27 UTC (History)
6 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description MZMcBride 2012-11-15 01:28:11 UTC
Currently when I try to access <https://labsconsole.wikimedia.org/wiki/Main_Page>, it takes about 15 seconds to load. Something is broken.
Comment 1 Andrew Bogott 2012-11-15 01:42:32 UTC
Restarted memcached on virt0.  Is that better?
Comment 2 MZMcBride 2012-11-15 01:54:30 UTC
(In reply to comment #1)
> Restarted memcached on virt0.  Is that better?

Yes, much. Thank you. :-)

I'm not sure if this bug is now resolved or if there's an underlying issue that needs to be investigated and corrected. I'll leave that determination to you.
Comment 3 Antoine "hashar" Musso (WMF) 2013-01-21 08:54:48 UTC
The memcached instance on virt0 dies like twice per day, that causes the labsconsole to be practically unusable and also drop any user session.

I would say that is part of the Infrastructure component. Please ping either Ryan Lane or Andrew Boggott (ops team) to get this issue resolved.

Possible culprit: virt0 going out of memory and memcached being killed by Linux out of memory killer.
Comment 4 Andrew Bogott 2013-01-21 17:31:02 UTC
The crash is caused by a hardware problem on the host (bad memory, probably.)  We have plans to migrate to fresh hardware but it won't happen immediately.

In theory puppet is restarting memcached when it runs, which should limit the periods of outage.  Have y'all experienced more than 30 minutes at a time of this?
Comment 5 Antoine "hashar" Musso (WMF) 2013-01-21 18:14:13 UTC
Thanks for the confirmation Andrew.  Can't we bring the machine down and run a memory test? That should isolate the faulty memory.

As for memcached, I usually have someone from ops to restart memcached, so downtime is pretty short for me :-]

I could not access to the Nagios history for the "Virt0 > memcached" service.  Hard to know how long it stays down.
Comment 6 Ryan Lane 2013-01-30 18:56:14 UTC
*** Bug 44499 has been marked as a duplicate of this bug. ***
Comment 7 Ryan Lane 2013-01-30 18:57:49 UTC
I've downgraded memcached on virt0. I've noticed the same behavior on nova-precise2, so it's very likely not a memory issue, but some memcached bug. If we still experience crashes, then I'll install a version with debugging symbols so that I can get a proper backtrace.
Comment 8 Laurence 'GreenReaper' Parry 2013-07-19 21:13:23 UTC
I got the error from 44499 when loading a set of five tabs on the same wiki after a browser restart.

This is not necessarily memcache, since this box is running on WinCache.

Just not waiting long enough? The server may have had to spin up as well.

----
Could not acquire 'ImpulseWiki:messages:en:status' lock.

Backtrace:

#0 D:\MediaWiki\core\includes\cache\MessageCache.php(710): MessageCache->load('en')
#1 D:\MediaWiki\core\includes\cache\MessageCache.php(650): MessageCache->getMsgFromNamespace('Pagetitle', 'en')
#2 D:\MediaWiki\core\includes\Message.php(720): MessageCache->get('pagetitle', true, Object(Language))
#3 D:\MediaWiki\core\includes\Message.php(464): Message->fetchMessage()
#4 D:\MediaWiki\core\includes\Message.php(553): Message->toString()
#5 D:\MediaWiki\core\includes\OutputPage.php(835): Message->text()
#6 D:\MediaWiki\core\includes\OutputPage.php(878): OutputPage->setHTMLTitle(Object(Message))
#7 D:\MediaWiki\core\includes\Article.php(554): OutputPage->setPageTitle('Powershell suck...')
#8 D:\MediaWiki\core\includes\actions\ViewAction.php(44): Article->view()
#9 D:\MediaWiki\core\includes\Wiki.php(439): ViewAction->show()
#10 D:\MediaWiki\core\includes\Wiki.php(305): MediaWiki->performAction(Object(Article), Object(Title))
#11 D:\MediaWiki\core\includes\Wiki.php(565): MediaWiki->performRequest()
#12 D:\MediaWiki\core\includes\Wiki.php(458): MediaWiki->main()
#13 D:\MediaWiki\core\index.php(59): MediaWiki->run()
#14 {main}
Comment 9 Ryan Lane 2013-07-19 21:27:04 UTC
This was due to a newer version of memcache and the way it handles memory exhaustion. I'm not sure why it's still open, it was fixed ages ago.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links