Last modified: 2014-10-10 19:11:45 UTC
Creating a new instance with the precise image fails and leaves the instance inaccessible from ssh. I wanted to create an additional integration-slave running Precise to scale out or Jenkins pool, but it failed to provision properly. https://wikitech.wikimedia.org/w/index.php?title=Special:NovaInstance&action=consoleoutput&project=integration&instanceid=b65a604d-40ef-4b16-b527-bfb862ca3904®ion=eqiad Oct 7 08:54:09 integration-slave1004 puppet-agent[981]: Enabling Puppet. Oct 7 08:54:09 integration-slave1004 puppet-agent[773]: Could not request certificate: getaddrinfo: Name or service not known Oct 7 08:54:10 integration-slave1004 puppet-agent[932]: Could not request certificate: getaddrinfo: Name or service not known Oct 7 08:55:11 integration-slave1004 nslcd[901]: [b0dc51] <group/member="root"> ldap_start_tls_s() failed: Can't contact LDAP server: Connection timed out (uri="ldap://virt0.wikimedia.org:389") Oct 7 08:55:11 integration-slave1004 nslcd[901]: [b0dc51] <group/member="root"> failed to bind to LDAP server ldap://virt0.wikimedia.org:389: Can't contact LDAP server: Connection refused Oct 7 08:55:11 integration-slave1004 nslcd[901]: [334873] <group/member="root"> ldap_start_tls_s() failed: Can't contact LDAP server: Connection timed out (uri="ldap://virt0.wikimedia.org:389") Oct 7 08:55:11 integration-slave1004 nslcd[901]: [334873] <group/member="root"> failed to bind to LDAP server ldap://virt0.wikimedia.org:389: Can't contact LDAP server: Connection timed out Oct 7 08:55:12 integration-slave1004 nslcd[901]: [b0dc51] <group/member="root"> connected to LDAP server ldap://virt1000.wikimedia.org:389 Oct 7 08:55:12 integration-slave1004 nslcd[901]: [b0dc51] <group/member="root"> ldap_result() failed: No such object Oct 7 08:55:12 integration-slave1004 nslcd[901]: [b0dc51] <group/member="root"> ldap_result() failed: No such object Oct 7 08:55:13 integration-slave1004 nslcd[901]: [334873] <group/member="root"> connected to LDAP server ldap://virt1000.wikimedia.org:389 Oct 7 08:55:13 integration-slave1004 nslcd[901]: [334873] <group/member="root"> ldap_result() failed: No such object Oct 7 08:55:13 integration-slave1004 nslcd[901]: [334873] <group/member="root"> ldap_result() failed: No such object .. Oct 7 08:55:19 integration-slave1004 puppet-agent[1218]: Creating a new SSL key for i-00000670.eqiad.wmflabs .. Oct 7 08:55:28 integration-slave1004 nslcd[1059]: [3c9869] <group(all)> ldap_start_tls_s() failed: Can't contact LDAP server: Connection timed out (uri="ldap://virt0.wikimedia.org:389") Oct 7 08:55:28 integration-slave1004 nslcd[1059]: [3c9869] <group(all)> failed to bind to LDAP server ldap://virt0.wikimedia.org:389: Can't contact LDAP server: Connection timed out Oct 7 08:55:29 integration-slave1004 nslcd[1059]: [3c9869] <group(all)> connected to LDAP server ldap://virt1000.wikimedia.org:389 Oct 7 08:55:29 integration-slave1004 nslcd[1059]: [3c9869] <group(all)> ldap_result() failed: No such object Oct 7 08:55:29 integration-slave1004 nslcd[1059]: [7b23c6] <group/member="puppet"> ldap_start_tls_s() failed: Can't contact LDAP server: Connection timed out (uri="ldap://virt0.wikimedia.org:389") Oct 7 08:55:29 integration-slave1004 nslcd[1059]: [7b23c6] <group/member="puppet"> failed to bind to LDAP server ldap://virt0.wikimedia.org:389: Can't contact LDAP server: Connection timed out Oct 7 08:55:29 integration-slave1004 nslcd[1059]: [7b23c6] <group/member="puppet"> connected to LDAP server ldap://virt1000.wikimedia.org:389 Oct 7 08:55:29 integration-slave1004 nslcd[1059]: [7b23c6] <group/member="puppet"> ldap_result() failed: No such object Oct 7 08:55:29 integration-slave1004 nslcd[1059]: [7b23c6] <group/member="puppet"> ldap_result() failed: No such object Oct 7 08:55:30 integration-slave1004 nslcd[1059]: [334873] <group/member="puppet"> ldap_start_tls_s() failed: Can't contact LDAP server: Connection timed out (uri="ldap://virt0.wikimedia.org:389") Oct 7 08:55:30 integration-slave1004 nslcd[1059]: [334873] <group/member="puppet"> failed to bind to LDAP server ldap://virt0.wikimedia.org:389: Can't contact LDAP server: Connection timed out Oct 7 08:55:30 integration-slave1004 nslcd[1059]: [334873] <group/member="puppet"> connected to LDAP server ldap://virt1000.wikimedia.org:389 Oct 7 08:55:30 integration-slave1004 nslcd[1059]: [334873] <group/member="puppet"> ldap_result() failed: No such object Oct 7 08:55:30 integration-slave1004 nslcd[1059]: [334873] <group/member="puppet"> ldap_result() failed: No such object Oct 7 08:55:33 integration-slave1004 nslcd[1059]: [b0dc51] <group/member="puppet"> ldap_start_tls_s() failed: Can't contact LDAP server: Connection timed out (uri="ldap://virt0.wikimedia.org:389") Oct 7 08:55:33 integration-slave1004 nslcd[1059]: [b0dc51] <group/member="puppet"> failed to bind to LDAP server ldap://virt0.wikimedia.org:389: Can't contact LDAP server: Connection timed out Oct 7 08:55:33 integration-slave1004 nslcd[1059]: [b0dc51] <group/member="puppet"> connected to LDAP server ldap://virt1000.wikimedia.org:389 Oct 7 08:55:33 integration-slave1004 nslcd[1059]: [b0dc51] <group/member="puppet"> ldap_result() failed: No such object Oct 7 08:55:33 integration-slave1004 nslcd[1059]: [b0dc51] <group/member="puppet"> ldap_result() failed: No such object Oct 7 08:55:33 integration-slave1004 nslcd[1059]: [e8944a] <group/member="root"> ldap_result() failed: No such object Oct 7 08:55:33 integration-slave1004 nslcd[1059]: [e8944a] <group/member="root"> ldap_result() failed: No such object .. Oct 7 09:18:48 integration-slave1004 puppet-agent[932]: Could not request certificate: getaddrinfo: Temporary failure in name resolution
I suspect the labs image for Ubuntu Precise hasn't been updated to take in account the recent LDAP changes (phasing out pmtpa / ldap renaming). Seems to me the image need to be refreshed, for continuous integration purposes we still need Precise instances.
I just tested this a moment ago, and it worked fine for me. I installed a new precise base image on Friday that uses the new ldap settings as well as including an updated bash and a separate /var/log partition.
OK -- that last comment was both right and wrong. New instances /do/ work. But there's still a smattering of virt0 and virt1000 references in them, which I am cleaning up.
I don't think the ldap thing is the problem. The log I pasted in comment 0 shows that it tried both. It's failing for a different reason.
I just created new images last night which seem generally happier. Try again?
The existing instance was never fixed, but it seems to work fine for new instances indeed (assuming it's not a race condition). I'll nuke the instance and re-create it for now.