Last modified: 2014-03-23 06:05:02 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T52585, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 50585 - Silence the qacct transfer jobs and monitor them with Icinga instead
Silence the qacct transfer jobs and monitor them with Icinga instead
Status: RESOLVED INVALID
Product: Wikimedia Labs
Classification: Unclassified
tools (Other open bugs)
unspecified
All All
: Low enhancement
: ---
Assigned To: Tim Landscheidt
:
Depends on: 52560
Blocks:
  Show dependency treegraph
 
Reported: 2013-07-02 15:13 UTC by Tim Landscheidt
Modified: 2014-03-23 06:05 UTC (History)
1 user (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Tim Landscheidt 2013-07-02 15:13:54 UTC
During the NFS outage, the qacct transfer jobs pestered the roots' mailboxes every five minutes.  Though such an outage of course will never ever happen again :-), it sucked nonetheless.

The transfer job is a service and if we would monitor it as one, we would get better behaviour as well: A nice green or red icon on a web dashboard, and only one (or none?) ping by mail when the status *changes*.

So we should set up Icinga monitoring for that:

a) The transfer job directs all stdout/stderr to a file, saves its exit code in another and periodically these files are queried by Icinga.

b) The transfer job passes its output and exit code directly to an Icinga sentinel that passes it somewhere up the chain.

Whether a) or b) are preferable (or possible for that matter), I haven't figured out yet, but this bug will track the progress on that.
Comment 1 Tim Landscheidt 2013-08-08 18:15:41 UTC
http://blog.endpoint.com/2012/04/monitoring-cronjob-exit-codes-with.html has an example how to monitor cron jobs.
Comment 2 Tim Landscheidt 2014-03-23 06:05:02 UTC
The qacct transfer cron job has been replaced with symlinks that make copying the accounting file unnecessary (cf. Gerrit change #114950 and Gerrit change #118120).

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links