Last modified: 2014-07-22 01:04:14 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T70254, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 68254 - Jenkins: Job runner slaves in labs no longer updated by puppet
Jenkins: Job runner slaves in labs no longer updated by puppet
Status: RESOLVED FIXED
Product: Wikimedia
Classification: Unclassified
Continuous integration (Other open bugs)
wmf-deployment
All All
: Highest critical (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-07-19 02:03 UTC by Krinkle
Modified: 2014-07-22 01:04 UTC (History)
5 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Krinkle 2014-07-19 02:03:04 UTC
I don't know for how many weeks or months this has been broken but the logs are full of failures since at least July 14.

Info: Retrieving plugin
Error: Could not set 'file' on ensure: cannot generate tempfile `/var/lib/puppet/lib/puppet/parser/functions/floor.rb20140719-26226-1g5xssw-9'
Error: Could not set 'file' on ensure: cannot generate tempfile `/var/lib/puppet/lib/puppet/parser/functions/floor.rb20140719-26226-1g5xssw-9'
Wrapped exception:
cannot generate tempfile `/var/lib/puppet/lib/puppet/parser/functions/floor.rb20140719-26226-1g5xssw-9'
Error: /File[/var/lib/puppet/lib/puppet/parser/functions/floor.rb]/ensure: change from absent to file failed: Could not set 'file' on ensure: cannot generate tempfile `/var/lib/puppet/lib/puppet/parser/functions/floor.rb20140719-26226-1g5xssw-9'
Info: Loading facts in /var/lib/puppet/lib/facter/facter_dot_d.rb
Info: Loading facts in /var/lib/puppet/lib/facter/physicalcorecount.rb
Info: Loading facts in /var/lib/puppet/lib/facter/apt.rb
Info: Loading facts in /var/lib/puppet/lib/facter/root_home.rb
Info: Loading facts in /var/lib/puppet/lib/facter/default_gateway.rb
Info: Loading facts in /var/lib/puppet/lib/facter/puppet_vardir.rb
Info: Loading facts in /var/lib/puppet/lib/facter/meminbytes.rb
Info: Loading facts in /var/lib/puppet/lib/facter/ec2id.rb
Info: Loading facts in /var/lib/puppet/lib/facter/pe_version.rb
Info: Loading facts in /var/lib/puppet/lib/facter/puppet_config_dir.rb
Info: Loading facts in /var/lib/puppet/lib/facter/projectgid.rb
Info: Caching catalog for i-000003cb.eqiad.wmflabs
Error: Could not retrieve catalog from remote server: cannot generate tempfile `/var/lib/puppet/client_data/catalog/i-000003cb.eqiad.wmflabs.json20140719-26226-15hd19i-9'
Warning: Not using cache on failed catalog
Error: Could not retrieve catalog; skipping run
Error: Could not save last run local report: cannot generate tempfile `/var/lib/puppet/state/last_run_summary.yaml20140719-26226-n9zv3g-9'
Comment 1 Antoine "hashar" Musso (WMF) 2014-07-19 19:45:20 UTC
/var is full. Someone thought it would be a good idea to only allocate 2GB to /var for labs instance.  Once Ubuntu is installed there is only a few hundred megabytes free :-/
Comment 2 Antoine "hashar" Musso (WMF) 2014-07-19 19:49:43 UTC
integration-slave1001.eqiad.wmflabs$ du -h /var/log/diamond
1.1G	/var/log/diamond

integration-slave1002$ du -h /var/log/diamond
1.2G	/var/log/diamond

integration-slave1003:~$ du -h /var/log/diamond
1.1G	/var/log/diamond


Basically diamond logs have never been rotated, the first entry in the log date back from May 22nd.
Comment 3 Antoine "hashar" Musso (WMF) 2014-07-19 19:53:29 UTC
Cleared out /var/log/diamond/diamond.log on the three slaves + on puppetmaster.

We would need a RT ticket to figure out why diamond logs are not logrotated and whether it affects others instances / production.
Comment 4 Greg Grossmeier 2014-07-21 21:05:19 UTC
(In reply to Antoine "hashar" Musso from comment #3)
> Cleared out /var/log/diamond/diamond.log on the three slaves + on
> puppetmaster.

Has that improved the puppet situation?

> We would need a RT ticket to figure out why diamond logs are not logrotated
> and whether it affects others instances / production.

https://rt.wikimedia.org/Ticket/Display.html?id=7945
Comment 5 Chase 2014-07-21 22:09:10 UTC
what is the _latest_ time stamp for these logs.  My guess is they are orphaned and can be removed.
Comment 6 Greg Grossmeier 2014-07-21 22:23:22 UTC
Looks like the underlying issue has already been fixed, thanks Chase.

https://bugzilla.wikimedia.org/show_bug.cgi?id=66458

Confirmation that puppet is running successfully?
Comment 7 Antoine "hashar" Musso (WMF) 2014-07-22 00:58:35 UTC
(In reply to Chase from comment #5)
> what is the _latest_ time stamp for these logs.  My guess is they are
> orphaned and can be removed.

I haven't looked at the last timestamp.  The files were definitely being written too though.

We use our own puppetmaster which is rebased manually. YuviPanda commented on bug 66458 that:

 It does log, but only logs errors. We killed the archive handler that logged all the *metrics* being sent, which was causing the huge log files.


So I guess that was fixed by a puppet change.  Since most instances were/are broken the fix never landed.


I have to verify all instances now.
Comment 8 Antoine "hashar" Musso (WMF) 2014-07-22 01:04:14 UTC
The logs are smaller now :-)  Thank you!

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links