Last modified: 2013-08-29 20:13:20 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T53937, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 51937 - execution hosts are randomly unresponsive
execution hosts are randomly unresponsive
Status: RESOLVED INVALID
Product: Wikimedia Labs
Classification: Unclassified
tools (Other open bugs)
unspecified
All All
: Lowest minor
: ---
Assigned To: Marc A. Pelletier
:
Depends on:
Blocks: 51935
  Show dependency treegraph
 
Reported: 2013-07-24 08:50 UTC by Peter Bena
Modified: 2013-08-29 20:13 UTC (History)
2 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Peter Bena 2013-07-24 08:50:42 UTC
It's because of NFS outages - this is killing irc bots, mark this bug as resolved once all issues related to NFS are fixed and servers can hold up at least for a month without random outages
Comment 1 Marc A. Pelletier 2013-07-24 14:28:01 UTC
Why/how would the filesystem stalling for brief periods make IRC bots die?  Reports of what was in the logs show connections to IRC /servers/ timing out or being denied.

The problem seems to be on Freenode's side.
Comment 2 Peter Bena 2013-07-24 15:14:53 UTC
I don't know if it's on freenode side or not, but when the servers becomes unusable (thanks to nfs for example) some of bellow happens

a) labs-morebots get disconnected (wm-bot doesn't)
b) you typically reboot servers after fix of nfs, which kill them anyway
c) system become unstable / crash
d) irc bot may need to touch a disk, which because every tool must be hosted on nfs block it. Typically binaries that are hosted on nfs, at least when they are CLI are read on demand, not loaded to operating memory as they are, so even the mere execution of program may require read from disk

So this is a problem for irc bots as well, even if it doesn't look like that at first sight
Comment 3 Peter Bena 2013-07-24 15:16:19 UTC
This could be probably avoided if bot was using network bouncer which was copied to local filesystem and then started.

This sounds a bit complex and doesn't fix the other mentioned issues (like reboot after a fix)
Comment 4 Marc A. Pelletier 2013-08-29 20:13:20 UTC
This has been rendered moot by the NFS server no longer stalling.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links