Last modified: 2013-11-13 09:44:53 UTC
Ganglia on wmflabs is missing disk I/O reporting. The reason we want them, is to be able to tell which instance is doing heavy I/O activities which might be kill GlusterFS (see bug 36993). There is a Gmetric plugin which we might want to use. Based on /proc/diskstats https://github.com/ganglia/gmetric/blob/master/disk/diskio.pl/ganglia_disk_stats.pl We used to have a homegrown `ganglia-metrics` debian package in /trunk/ganglia_metrics, it is probably obsolete nowadays. Anyway, there was a python script there: http://svn.wikimedia.org/viewvc/mediawiki/trunk/ganglia_metrics/DiskStats.py?view=markup&pathrev=69278 OR, maybe Ganglia already provides the metrics and it is all about enabling them?
Ryan: Do you plan to work on this (as you're set as assignee)?
I'm the default assignee. I added this for anyone to work on.
Change 85669 had a related patch set uploaded by Hashar: ganglia wrapper for py plugins (and add diskstat plugin) https://gerrit.wikimedia.org/r/85669
I wrote a puppet patch which is now pending review/merge by ops.
Change 91351 had a related patch set uploaded by Hashar: ganglia: diskstat.py plugin https://gerrit.wikimedia.org/r/91351
Change 91352 had a related patch set uploaded by Hashar: contint: monitor CI server diskstats in Ganglia https://gerrit.wikimedia.org/r/91352
Change 91351 merged by Ori.livneh: ganglia: diskstat.py plugin https://gerrit.wikimedia.org/r/91351
Change 91352 had a related patch set uploaded by Ori.livneh: contint: monitor CI server diskstats in Ganglia https://gerrit.wikimedia.org/r/91352
Change 91352 merged by Ori.livneh: contint: monitor CI server diskstats in Ganglia https://gerrit.wikimedia.org/r/91352
We got disk stats on the production continuous integration server (gallium and lanthanum). That was the purpose of this bug and it got solved by the changes above.