Last modified: 2013-03-12 21:24:38 UTC
Some instances of the deployment-prep projects are not monitored by Icinga: http://icinga.wmflabs.org/cgi-bin/icinga/status.cgi?hostgroup=deployment-prep&style=detail The NRPE daemon is listening on port 5666. The project has a default security rule to allow 5666 from 10.4.0.0/21.
It seems nrpe does not restart as expected - basically the process doesn't quit so it never really restarts. Since the IP of monitoring changed the config has updated, but the service is running with the old IP. To resolve run `killall nrpe; /etc/init.d/nagios-nrpe-server start` on the instances; I'm trying to get Ryan to run this labs-wide via salt to clean up the currently alerting ones.
yea, sounds like a problem we had in production before. nagios-nrpe-server would have issues restarting correctly. Looked to me though as this was resolved after the switch to Icinga (we cleaned up, incl. getting rid of an old init script for nrpe server). In the past we attempted to fix that by adding a sleep command to the init script.
root@virt0:~# salt '*' cmd.run "killall nrpe; /etc/init.d/nagios-nrpe-server start" killed and restarted on all instances
Works for me now :-] Thanks Daniel!