Last modified: 2014-10-24 14:38:42 UTC
Sometimes puppet breaks, it happens, but we need to know when it happens in Beta Cluster. Something similar to the production icinga puppet freshness check, or even tying it in with Zuul and reporting it if a puppet run fails.
Sadly we can't really use icinga properly on labs (so I'm told, due to the way resource collection works with puppet). Also the prod icinga stuff is not really replicatable on labs either, with prod specific config hard-coded everywhere (would need to be converted to a module first). A different solution needs to be thought of.
One thing we could do in beta would be to send puppet logs to the beta logstash instance and then announce to #wikimedia-labs and/or #wikimedia-qa via a logstash rule. See bug 60690 for a puppet into logstash general request.
https://gerrit.wikimedia.org/r/#/c/143193/ will log puppetagent metrics into labs graphite, including last run time.
(In reply to Yuvi Panda from comment #1) > Sadly we can't really use icinga properly on labs (so I'm told, due to the > way resource collection works with puppet). Also the prod icinga stuff is > not really replicatable on labs either, with prod specific config hard-coded > everywhere (would need to be converted to a module first). I wasn't thinking of necessarily copy/pasting the icinga config from prod, but we have a beta labs icinga (http://icinga.wmflabs.org/icinga/) which could theoretically be used for this, no?
(In reply to Greg Grossmeier from comment #4) > I wasn't thinking of necessarily copy/pasting the icinga config from prod, > but we have a beta labs icinga (http://icinga.wmflabs.org/icinga/) which > could theoretically be used for this, no? Theoretically, yeah :) But that effort was largely undocumented and unpuppetized, and I don't know of anyone who has actually touched that in forever.
(In reply to Yuvi Panda from comment #5) > Theoretically, yeah :) But that effort was largely undocumented and > unpuppetized, and I don't know of anyone who has actually touched that in > forever. Sad :(
We now have notifications on irc channel #wikimedia-qa and a few people receives an hourly mail until all instances pass puppet. The only thing left to do, is having more people to receive the notifications.
Blocks Bug 51497 - Setup monitoring for Beta cluster