Last modified: 2014-07-02 08:38:48 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T69349, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 67349 - Investigate broken puppet on Beta Cluster
Investigate broken puppet on Beta Cluster
Status: RESOLVED FIXED
Product: Wikimedia Labs
Classification: Unclassified
deployment-prep (beta) (Other open bugs)
unspecified
All All
: High normal
: ---
Assigned To: Ori Livneh
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-07-01 06:50 UTC by Greg Grossmeier
Modified: 2014-07-02 08:38 UTC (History)
8 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Greg Grossmeier 2014-07-01 06:50:03 UTC
Puppet on Beta Cluster broke sometime in the last week-ish; let's figure out where and fix it :)
Comment 1 Bryan Davis 2014-07-01 22:35:42 UTC
Report on beta puppet run status:
* ssh deployment-salt.eqiad.wmflabs
* sudo salt '*' cmd.run '(hostname; date -d @$(grep last_run /var/lib/puppet/state/last_run_summary.yaml | awk "{print \$2}") +%Y-%m-%dT%H:%M:%S; grep status: /var/lib/puppet/state/last_run_report.yaml | head -1 | awk "{print \$2}")|tr -s [:space:] "\t"' | grep deployment- | sort

Hosts currently failing:
* deployment-bastion
* deployment-graphite
* deployment-jobrunner01
* deployment-logstash1
* deployment-videoscaler01
Comment 2 Bryan Davis 2014-07-01 22:57:06 UTC
Several of the failures reported above were caused by the local hacks that had been put in place to get puppet to run on deployment-apache0[12]. I have stashed those changes and am forcing a puppet run across the cluster to get a more accurate report.
Comment 3 Bryan Davis 2014-07-01 23:20:51 UTC
With the local hacks removed and a bad lock file deleted on deployment-jobrunner01, all hosts in beta are now reporting their latest puppet run as successful:

    deployment-analytics01      2014-07-01T23:07:07     success
    deployment-apache01         2014-07-01T23:09:50     success
    deployment-apache02         2014-07-01T23:10:51     success
    deployment-bastion          2014-07-01T23:17:13     success
    deployment-cache-bits01     2014-07-01T23:07:58     success
    deployment-cache-mobile03   2014-07-01T23:16:54     success
    deployment-cache-text02     2014-07-01T22:59:29     success
    deployment-cache-upload02   2014-07-01T23:11:50     success
    deployment-db1              2014-07-01T23:13:30     success
    deployment-elastic01        2014-07-01T23:16:40     success
    deployment-elastic02        2014-07-01T23:12:00     success
    deployment-elastic03        2014-07-01T23:03:58     success
    deployment-elastic04        2014-07-01T23:17:53     success
    deployment-eventlogging02   2014-07-01T23:03:34     success
    deployment-fluoride         2014-07-01T23:04:05     success
    deployment-graphite         2014-07-01T23:18:06     success
    deployment-jobrunner01      2014-07-01T23:15:19     success
    deployment-logstash1        2014-07-01T23:13:50     success
    deployment-memc02           2014-07-01T23:05:08     success
    deployment-memc04           2014-07-01T23:08:29     success
    deployment-memc05           2014-07-01T23:08:14     success
    deployment-parsoid04        2014-07-01T22:59:41     success
    deployment-parsoidcache01   2014-06-21T16:18:31     success
    deployment-pdf01            2014-07-01T23:02:53     success
    deployment-pdf01            2014-07-01T23:02:53     success
    deployment-redis01          2014-07-01T23:02:08     success
    deployment-rsync01          2014-07-01T23:10:48     success
    deployment-salt             2014-07-01T23:10:50     changed
    deployment-stream           2014-07-01T23:10:54     success
    deployment-stream           2014-07-01T23:10:54     success
    deployment-upload           2014-07-01T23:03:05     success
    deployment-videoscaler01    2014-07-01T23:06:39     success

Many thanks to Ori and anyone else who helped get this fixed.
Comment 4 Antoine "hashar" Musso (WMF) 2014-07-02 08:38:48 UTC
Well done!

Having the issue reported is bug 67333 - Yell loudly of failed puppet runs on Beta Cluster instances 

Which depends on Bug 63296 - puppet labsstatus not reported when using role::puppet::self

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links