Last modified: 2014-07-02 08:38:48 UTC
Puppet on Beta Cluster broke sometime in the last week-ish; let's figure out where and fix it :)
Report on beta puppet run status: * ssh deployment-salt.eqiad.wmflabs * sudo salt '*' cmd.run '(hostname; date -d @$(grep last_run /var/lib/puppet/state/last_run_summary.yaml | awk "{print \$2}") +%Y-%m-%dT%H:%M:%S; grep status: /var/lib/puppet/state/last_run_report.yaml | head -1 | awk "{print \$2}")|tr -s [:space:] "\t"' | grep deployment- | sort Hosts currently failing: * deployment-bastion * deployment-graphite * deployment-jobrunner01 * deployment-logstash1 * deployment-videoscaler01
Several of the failures reported above were caused by the local hacks that had been put in place to get puppet to run on deployment-apache0[12]. I have stashed those changes and am forcing a puppet run across the cluster to get a more accurate report.
With the local hacks removed and a bad lock file deleted on deployment-jobrunner01, all hosts in beta are now reporting their latest puppet run as successful: deployment-analytics01 2014-07-01T23:07:07 success deployment-apache01 2014-07-01T23:09:50 success deployment-apache02 2014-07-01T23:10:51 success deployment-bastion 2014-07-01T23:17:13 success deployment-cache-bits01 2014-07-01T23:07:58 success deployment-cache-mobile03 2014-07-01T23:16:54 success deployment-cache-text02 2014-07-01T22:59:29 success deployment-cache-upload02 2014-07-01T23:11:50 success deployment-db1 2014-07-01T23:13:30 success deployment-elastic01 2014-07-01T23:16:40 success deployment-elastic02 2014-07-01T23:12:00 success deployment-elastic03 2014-07-01T23:03:58 success deployment-elastic04 2014-07-01T23:17:53 success deployment-eventlogging02 2014-07-01T23:03:34 success deployment-fluoride 2014-07-01T23:04:05 success deployment-graphite 2014-07-01T23:18:06 success deployment-jobrunner01 2014-07-01T23:15:19 success deployment-logstash1 2014-07-01T23:13:50 success deployment-memc02 2014-07-01T23:05:08 success deployment-memc04 2014-07-01T23:08:29 success deployment-memc05 2014-07-01T23:08:14 success deployment-parsoid04 2014-07-01T22:59:41 success deployment-parsoidcache01 2014-06-21T16:18:31 success deployment-pdf01 2014-07-01T23:02:53 success deployment-pdf01 2014-07-01T23:02:53 success deployment-redis01 2014-07-01T23:02:08 success deployment-rsync01 2014-07-01T23:10:48 success deployment-salt 2014-07-01T23:10:50 changed deployment-stream 2014-07-01T23:10:54 success deployment-stream 2014-07-01T23:10:54 success deployment-upload 2014-07-01T23:03:05 success deployment-videoscaler01 2014-07-01T23:06:39 success Many thanks to Ori and anyone else who helped get this fixed.
Well done! Having the issue reported is bug 67333 - Yell loudly of failed puppet runs on Beta Cluster instances Which depends on Bug 63296 - puppet labsstatus not reported when using role::puppet::self