Last modified: 2014-10-20 14:00:38 UTC
Created attachment 16814 [details] Graph: Memory last week - Nagf See also https://bugzilla.wikimedia.org/show_bug.cgi?id=68260 https://bugzilla.wikimedia.org/show_bug.cgi?id=72014 As of ~ 23:00 UTC October 15, puppet is failing on integration-dev-precise due to an error in the puppet provision inside elasticsearch. Before, on the morning of 2014-10-15: Oct 15 07:01:02 integration-dev-precise puppet-agent[14037]: Sleeping for 26 seconds (splay is enabled) Oct 15 07:01:28 integration-dev-precise puppet-agent[14037]: Retrieving plugin Oct 15 07:01:30 integration-dev-precise puppet-agent[14037]: Loading facts .. Oct 15 07:01:35 integration-dev-precise puppet-agent[14037]: Caching catalog for i-00000650.eqiad.wmflabs Oct 15 07:01:36 integration-dev-precise puppet-agent[14037]: Applying configuration version '1413356223' .. Oct 15 07:01:57 integration-dev-precise puppet-agent[14037]: hostname: integration-dev-precise Oct 15 07:01:57 integration-dev-precise puppet-agent[14037]: (/Stage[main]/Role::Labs::Instance/Notify[hostname: integration-dev-precise]/message) defined 'message' as 'hostname: integration-dev-precise' Oct 15 07:01:58 integration-dev-precise kernel: [1157554.617854] init: ganglia-monitor main process (14773) terminated with status 1 Oct 15 07:01:58 integration-dev-precise kernel: [1157554.617881] init: ganglia-monitor main process ended, respawning Oct 15 07:01:58 integration-dev-precise puppet-agent[14037]: (/Stage[main]/Ganglia_new::Monitor::Service/Service[ganglia-monitor]/ensure) ensure changed 'stopped' to 'running' Oct 15 07:01:58 integration-dev-precise puppet-agent[14037]: (/Stage[main]/Ganglia_new::Monitor::Service/Service[ganglia-monitor]) Unscheduling refresh on Service[ganglia-monitor] Oct 15 07:01:58 integration-dev-precise kernel: [1157554.624667] init: ganglia-monitor main process (14774) terminated with status 1 Oct 15 07:01:58 integration-dev-precise kernel: [1157554.624694] init: ganglia-monitor main process ended, respawning .. Oct 15 07:01:58 integration-dev-precise kernel: [1157554.678814] init: ganglia-monitor main process (14787) terminated with status 1 Oct 15 07:01:58 integration-dev-precise kernel: [1157554.678840] init: ganglia-monitor respawning too fast, stopped Oct 15 07:02:01 integration-dev-precise puppet-agent[14037]: Finished catalog run in 25.22 seconds After, closely before midnight 2014-10-16: Oct 15 23:41:04 integration-dev-precise puppet-agent[5268]: Sleeping for 39 seconds (splay is enabled) Oct 15 23:41:43 integration-dev-precise puppet-agent[5268]: Retrieving plugin Oct 15 23:41:45 integration-dev-precise puppet-agent[5268]: Loading facts .. Oct 15 23:41:53 integration-dev-precise puppet-agent[5268]: Caching catalog for i-00000650.eqiad.wmflabs Oct 15 23:41:58 integration-dev-precise puppet-agent[5268]: Applying configuration version '1413416388' .. Oct 15 23:42:37 integration-dev-precise puppet-agent[5268]: hostname: integration-dev-precise Oct 15 23:42:37 integration-dev-precise puppet-agent[5268]: (/Stage[main]/Role::Labs::Instance/Notify[hostname: integration-dev-precise]/message) defined 'message' as 'hostname: integration-dev-precise' Oct 15 23:42:41 integration-dev-precise puppet-agent[5268]: (/Stage[main]/Ganglia_new::Monitor::Service/Service[ganglia-monitor]/ensure) ensure changed 'stopped' to 'running' Oct 15 23:42:41 integration-dev-precise kernel: [1217597.861552] init: ganglia-monitor main process (6548) terminated with status 1 Oct 15 23:42:41 integration-dev-precise kernel: [1217597.861576] init: ganglia-monitor main process ended, respawning Oct 15 23:42:41 integration-dev-precise puppet-agent[5268]: (/Stage[main]/Ganglia_new::Monitor::Service/Service[ganglia-monitor]) Unscheduling refresh on Service[ganglia-monitor] Oct 15 23:42:41 integration-dev-precise kernel: [1217597.868022] init: ganglia-monitor main process (6549) terminated with status 1 Oct 15 23:42:41 integration-dev-precise kernel: [1217597.868049] init: ganglia-monitor main process ended, respawning .. Oct 15 23:42:41 integration-dev-precise kernel: [1217597.930890] init: ganglia-monitor main process (6561) terminated with status 1 Oct 15 23:42:41 integration-dev-precise kernel: [1217597.930913] init: ganglia-monitor respawning too fast, stopped Oct 15 23:42:41 integration-dev-precise puppet-agent[5268]: (/Stage[main]/Role::Labs::Lvm::Mnt/Labs_lvm::Volume[second-local-disk]/Labs_lvm::Extend[/mnt]/Exec[extend-vd-/mnt]/returns) executed successfully Oct 15 23:42:48 integration-dev-precise puppet-agent[5268]: (/Stage[main]/Role::Ci::Slave::Browsertests/File[/var/lib/elasticsearch]) Not removing directory; use 'force' to override Oct 15 23:42:48 integration-dev-precise puppet-agent[5268]: (/Stage[main]/Role::Ci::Slave::Browsertests/File[/var/lib/elasticsearch]) Not removing directory; use 'force' to override Oct 15 23:42:48 integration-dev-precise puppet-agent[5268]: Could not remove existing file Oct 15 23:42:48 integration-dev-precise puppet-agent[5268]: (/Stage[main]/Role::Ci::Slave::Browsertests/File[/var/lib/elasticsearch]/ensure) change from directory to link failed: Could not remove existing file Oct 15 23:42:48 integration-dev-precise puppet-agent[5268]: (/Stage[main]/Elasticsearch/File[/var/log/elasticsearch/elasticsearch_index_search_slowlog.log]) Dependency File[/var/lib/elasticsearch] has failures: true Oct 15 23:42:48 integration-dev-precise puppet-agent[5268]: (/Stage[main]/Elasticsearch/File[/var/log/elasticsearch/elasticsearch_index_search_slowlog.log]) Skipping because of failed dependencies Oct 15 23:42:48 integration-dev-precise puppet-agent[5268]: (/Stage[main]/Elasticsearch/File[/var/log/elasticsearch/elasticsearch_index_indexing_slowlog.log]) Dependency File[/var/lib/elasticsearch] has failures: true Oct 15 23:42:48 integration-dev-precise puppet-agent[5268]: (/Stage[main]/Elasticsearch/File[/var/log/elasticsearch/elasticsearch_index_indexing_slowlog.log]) Skipping because of failed dependencies Oct 15 23:42:48 integration-dev-precise puppet-agent[5268]: (/Stage[main]/Elasticsearch/File[/etc/logrotate.d/elasticsearch]) Dependency File[/var/lib/elasticsearch] has failures: true Oct 15 23:42:48 integration-dev-precise puppet-agent[5268]: (/Stage[main]/Elasticsearch/File[/etc/logrotate.d/elasticsearch]) Skipping because of failed dependencies Oct 15 23:42:48 integration-dev-precise puppet-agent[5268]: (/Stage[main]/Elasticsearch/File[/etc/elasticsearch/elasticsearch.yml]) Dependency File[/var/lib/elasticsearch] has failures: true Oct 15 23:42:48 integration-dev-precise puppet-agent[5268]: (/Stage[main]/Elasticsearch/File[/etc/elasticsearch/elasticsearch.yml]) Skipping because of failed dependencies Oct 15 23:42:48 integration-dev-precise puppet-agent[5268]: (/Stage[main]/Elasticsearch/File[/var/log/elasticsearch/elasticsearch.log]) Dependency File[/var/lib/elasticsearch] has failures: true Oct 15 23:42:48 integration-dev-precise puppet-agent[5268]: (/Stage[main]/Elasticsearch/File[/var/log/elasticsearch/elasticsearch.log]) Skipping because of failed dependencies Oct 15 23:42:48 integration-dev-precise puppet-agent[5268]: (/Stage[main]/Elasticsearch/File[/etc/elasticsearch/logging.yml]) Dependency File[/var/lib/elasticsearch] has failures: true Oct 15 23:42:48 integration-dev-precise puppet-agent[5268]: (/Stage[main]/Elasticsearch/File[/etc/elasticsearch/logging.yml]) Skipping because of failed dependencies Oct 15 23:42:48 integration-dev-precise puppet-agent[5268]: (/Stage[main]/Elasticsearch/File[/etc/default/elasticsearch]) Dependency File[/var/lib/elasticsearch] has failures: true Oct 15 23:42:48 integration-dev-precise puppet-agent[5268]: (/Stage[main]/Elasticsearch/File[/etc/default/elasticsearch]) Skipping because of failed dependencies Oct 15 23:42:48 integration-dev-precise puppet-agent[5268]: (/Stage[main]/Elasticsearch/Service[elasticsearch]) Dependency File[/var/lib/elasticsearch] has failures: true Oct 15 23:42:48 integration-dev-precise puppet-agent[5268]: (/Stage[main]/Elasticsearch/Service[elasticsearch]) Skipping because of failed dependencies Oct 15 23:42:49 integration-dev-precise puppet-agent[5268]: Finished catalog run in 53.29 seconds Attached: Graphs of the relevant time period from https://tools.wmflabs.org/nagf/?project=integration#h_integration-dev-precise_memory
Created attachment 16815 [details] Graph: Disk space last week - Nagf The /mnt mount first appears during this puppet run.
Created attachment 16816 [details] Graph: Puppet runs last week - Nagf Puppet starts failing at 23:00 UTC October 15.