Last modified: 2014-10-24 14:38:42 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T69333, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 67333 - Yell loudly of failed puppet runs on Beta Cluster instances
Yell loudly of failed puppet runs on Beta Cluster instances
Status: NEW
Product: Wikimedia Labs
Classification: Unclassified
deployment-prep (beta) (Other open bugs)
unspecified
All All
: High normal
: ---
Assigned To: Nobody - You can work on this!
:
Depends on: 63296
Blocks: 51497
  Show dependency treegraph
 
Reported: 2014-06-30 22:40 UTC by Greg Grossmeier
Modified: 2014-10-24 14:38 UTC (History)
9 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Greg Grossmeier 2014-06-30 22:40:45 UTC
Sometimes puppet breaks, it happens, but we need to know when it happens in Beta Cluster.

Something similar to the production icinga puppet freshness check, or even tying it in with Zuul and reporting it if a puppet run fails.
Comment 1 Yuvi Panda 2014-06-30 22:54:37 UTC
Sadly we can't really use icinga properly on labs (so I'm told, due to the way resource collection works with puppet). Also the prod icinga stuff is not really replicatable on labs either, with prod specific config hard-coded everywhere (would need to be converted to a module first).

A different solution needs to be thought of.
Comment 2 Bryan Davis 2014-06-30 23:00:53 UTC
One thing we could do in beta would be to send puppet logs to the beta logstash instance and then announce to #wikimedia-labs and/or #wikimedia-qa via a logstash rule. See bug 60690 for a puppet into logstash general request.
Comment 3 Yuvi Panda 2014-06-30 23:16:08 UTC
https://gerrit.wikimedia.org/r/#/c/143193/ will log puppetagent metrics into labs graphite, including last run time.
Comment 4 Greg Grossmeier 2014-07-01 00:12:31 UTC
(In reply to Yuvi Panda from comment #1)
> Sadly we can't really use icinga properly on labs (so I'm told, due to the
> way resource collection works with puppet). Also the prod icinga stuff is
> not really replicatable on labs either, with prod specific config hard-coded
> everywhere (would need to be converted to a module first).

I wasn't thinking of necessarily copy/pasting the icinga config from prod, but we have a beta labs icinga (http://icinga.wmflabs.org/icinga/) which could theoretically be used for this, no?
Comment 5 Yuvi Panda 2014-07-01 01:09:34 UTC
(In reply to Greg Grossmeier from comment #4) 
> I wasn't thinking of necessarily copy/pasting the icinga config from prod,
> but we have a beta labs icinga (http://icinga.wmflabs.org/icinga/) which
> could theoretically be used for this, no?

Theoretically, yeah :) But that effort was largely undocumented and unpuppetized, and I don't know of anyone who has actually touched that in forever.
Comment 6 Greg Grossmeier 2014-07-01 02:58:50 UTC
(In reply to Yuvi Panda from comment #5)
> Theoretically, yeah :) But that effort was largely undocumented and
> unpuppetized, and I don't know of anyone who has actually touched that in
> forever.

Sad :(
Comment 7 Antoine "hashar" Musso (WMF) 2014-10-24 14:37:24 UTC
We now have notifications on irc channel #wikimedia-qa and a few people receives an hourly mail until all instances pass puppet.

The only thing left to do, is having more people to receive the notifications.
Comment 8 Antoine "hashar" Musso (WMF) 2014-10-24 14:38:42 UTC
Blocks Bug 51497 - Setup monitoring for Beta cluster

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links