Last modified: 2014-02-04 20:49:02 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T52485, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 50485 - morebots (adminbot) doesn't reliably detect disconnects
morebots (adminbot) doesn't reliably detect disconnects
Status: NEW
Product: Tool Labs tools
Classification: Unclassified
Morebots (Other open bugs)
unspecified
All All
: High normal
: ---
Assigned To: Nobody - You can work on this!
:
: 51777 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-07-01 05:07 UTC by Ori Livneh
Modified: 2014-02-04 20:49 UTC (History)
6 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Ori Livneh 2013-07-01 05:07:58 UTC
On Jun 29 02:13 UTC adminbot logged a LocalisationUpdate. Five hours later, at 07:18 UTC, it disconnected from IRC, with a server-generated ping timeout quit message. On Jul 1 Tim noticed that it was absent from the channel and checked the process state. It appeared to still be in a connected state, calling select() at regular intervals.

lsof showed:
adminlogb 20258 adminbot    4u  IPv4 8395416      0t0    TCP wikitech-static:57198->HUBBARD.CLUB.CC.CMU.EDU:afs3-fileserver (ESTABLISHED)

strace showed:
1372650153.070033 select(5, [4], [], [], {0, 51423}) = 0 (Timeout)
1372650153.122075 gettimeofday({1372650153, 122173}, NULL) = 0
1372650153.122379 select(5, [4], [], [], {0, 100000}) = 0 (Timeout)
1372650153.222975 gettimeofday({1372650153, 223084}, NULL) = 0

According to <http://poe.perl.org/?POE_Cookbook/IRC_Bot_Reconnecting>, a good disconnection detection algorithm should periodically ping the server to check that the connection is still alive. morebots does not.

The IRC library that morebots uses, irclib, does have a 'set_keepalive' method on the ServerConnection object, which causes it to ping the server at regular intervals. morebots should use it. We should also add an explicit check that a ping reply has been received in a timely fashion, and recycle the connection otherwise.
Comment 1 Andre Klapper 2013-07-01 09:23:06 UTC
Also see bug 47275 comment 6
Comment 2 Ori Livneh 2013-07-07 23:29:04 UTC
Happened again. Increasing priority.
Comment 3 Ori Livneh 2013-07-21 18:47:40 UTC
See also:
https://bitbucket.org/jaraco/irc/issue/16/irc-client-ping-timeout-https://bitbucket.org/jaraco/irc/issue/1/library-does-not-detect-that-connection-is

Additional notes:
* Tends to happen during the weekend.
* logmsgbot uses the same library, does not set a keepalive, and remains reliably connected.
* morebots is hosted on wikitech-static, which is hosted on Rackspace 

This supports the theory that this is caused by an aggressive TCP idle timeout that the library is not sufficiently robust to handle.

The upstream package maintainer doesn't seem especially interested in chasing this down or in documenting his changes properly, so I don't think it'd be worth the effort to update the Debian package to pull in his latest changes. I think we should implement something very crude but effective, like having the bot keep a 5-minute timer that resets whenever any data at all is read from the socket. If the timer reaches 0, the bot should kill itself and have upstart or init respawn it.
Comment 4 Ryan Lane 2013-07-21 20:35:03 UTC
We can also probably move it to tools.
Comment 5 Ori Livneh 2013-07-21 22:34:16 UTC
*** Bug 51777 has been marked as a duplicate of this bug. ***
Comment 6 MZMcBride 2013-07-25 23:35:28 UTC
(In reply to comment #4)
> We can also probably move it to tools.

Filed as bug 52069.
Comment 7 MZMcBride 2013-09-23 02:47:49 UTC
morebots has gone missing from #wikimedia-operations again.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links