Last modified: 2014-09-23 22:56:19 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T50668, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 48668 - Set up Icinga monitoring for grid
Set up Icinga monitoring for grid
Status: NEW
Product: Wikimedia Labs
Classification: Unclassified
tools (Other open bugs)
unspecified
All All
: High enhancement
: ---
Assigned To: Marc A. Pelletier
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-05-21 04:16 UTC by Tim Landscheidt
Modified: 2014-09-23 22:56 UTC (History)
2 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Tim Landscheidt 2013-05-21 04:16:44 UTC
Besides the Ganglia statistics, the grid's status should be properly monitored and alarms set up.  From the top of my head and without data to back it up:

- Master alive and well (no threads in error state!),
- every execution daemon alive and well,
- count of jobs in error state doesn't exceed 5 % of all jobs running,
- count of jobs pending doesn't exceed 5 % of all jobs running.
Comment 1 Sumana Harihareswara 2013-09-28 00:18:28 UTC
As with bug 51434 , I think this would be a very good step for improving the reliability of the services we provide -- and getting stats to show it. :-)

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links