Last modified: 2012-07-29 19:17:29 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T40639, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 38639 - puppetd can't run: LDAP Search failed
puppetd can't run: LDAP Search failed
Status: RESOLVED FIXED
Product: Wikimedia Labs
Classification: Unclassified
General (Other open bugs)
unspecified
All All
: Unprioritized major
: ---
Assigned To: Ryan Lane
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-07-24 12:29 UTC by Antoine "hashar" Musso (WMF)
Modified: 2012-07-29 19:17 UTC (History)
4 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Antoine "hashar" Musso (WMF) 2012-07-24 12:29:06 UTC
There seems to be an issue with the LDAP backend:


root@deployment-apache32:~# puppetd -tv
info: Loading facts in /var/lib/puppet/lib/facter/apt.rb
info: Loading facts in /var/lib/puppet/lib/facter/projectgid.rb
info: Loading facts in /var/lib/puppet/lib/facter/default_gateway.rb
err: Could not retrieve catalog from remote server: Error 400 on SERVER: Failed when searching for node i-0000031a.pmtpa.wmflabs: LDAP Search failed
warning: Not using cache on failed catalog
err: Could not retrieve catalog; skipping run
Comment 1 Faidon Liambotis 2012-07-24 12:41:14 UTC
OpenDJ was running but did not listen on 389. Nothing was in the logs (besides the usual virt1 replication errors that kept going). I restarted OpenDJ and everything seems to be working again.

Assigning to Ryan who knows OpenDJ better, maybe he can find the root cause.
Comment 2 Antoine "hashar" Musso (WMF) 2012-07-24 13:06:15 UTC
Works for me now :)
Comment 3 Antoine "hashar" Musso (WMF) 2012-07-24 18:29:43 UTC
We had a Nagios alert mentioning virt0 did not listen to LDAP.
Comment 4 Ryan Lane 2012-07-24 18:32:09 UTC
Hm. It's possible the iptables NAT rules were somehow removed. Restarting will clear them and re-add them. I can't really check now. We should check for that if it happens again.
Comment 5 Aude 2012-07-29 14:05:42 UTC
still getting errors and can't successfully create a new instance.

12:32 <+nagios-wm> PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds
12:45 <+nagios-wm> RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.023 seconds

Jul 29 13:43:07 i-00000373 puppet-agent[3935]: Starting Puppet client version 2.7.11
Jul 29 13:43:07 i-00000373 puppet-agent[3935]: Could not retrieve catalog from remote server: Error 400 on SERVER: Failed when searching for node i-00000373.pmtpa.wmflabs: LDAP Search failed
Jul 29 13:43:07 i-00000373 puppet-agent[3935]: Using cached catalog
Jul 29 13:43:07 i-00000373 puppet-agent[3935]: Could not retrieve catalog; skipping run

and a bunch of dhclient stuff, repeatedly retrying (don't know if it's normal, doubt it)
Comment 6 Ryan Lane 2012-07-29 19:17:29 UTC
Seems it happened again. I'm going to investigate the server failure and why the clients failed to failover properly.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links