Last modified: 2013-12-04 04:37:43 UTC
I created a jenkins-slave user on wikitech. The Wiki user is part of the 'shell' group and I have generated a ssh key pair for it. The public key is in LDAP. On integration-jenkins2 /home/jenkins-slave/.ssh contains the public and private keys. When attempting to connect locally the key get rejected: jenkins-slave@integration-jenkins2:~$ ssh localhost Permission denied (publickey). Running strace on the local ssh daemon yields: open("/etc/ssh/userkeys/jenkins-slave/.ssh/authorized_keys", O_RDONLY|O_NONBLOCK) = -1 EACCES (Permission denied) open("/public/keys/jenkins-slave/.ssh/authorized_keys", O_RDONLY|O_NONBLOCK) = -1 ENOENT (No such file or directory) And indeed the public key did not get exported on labstore1:/keys # ls -d /public/keys/jenkins-slave ls: cannot access /public/keys/jenkins-slave: No such file or directory # Related mount is: labstore1.pmtpa.wmnet:/keys on /public/keys type nfs (rw,vers=3,sloppy,addr=10.0.0.41)
Looking at stat() of the authorized keys, the last changes to them was on 2013-11-29 01:14:17 UTC. So something got changed/broken on Thursday 28th Nov (PST). Possible culprit: https://gerrit.wikimedia.org/r/#/c/98030/ Remove user accounts from the labstore boxes.
The cron that manages this runs on labstore2 rather than labstore1. (I'm not sure why.) It calls manage-keys, which adds keys to /mnt/keys and logs to /var/log/manage-keys.log. In a sort of standard gluster screwup, /mnt/keys was unreachable on labstore2, so manage-keys was failing. I gave gluster a kick in the pants and remounted /mnt/keys and now all seems well. The cron that runs manage-keys is not in puppet, which is wrong -- but, we're hoping to stop using Gluster in labs entirely in the next month or two, so I'm not going to worry much about organizing things.
That solved it, unlocked two tasks I was working on. Thank you!