Last modified: 2014-05-06 15:15:58 UTC
On integration labs project, the GlusterFS volume for /home is screwed and is marked read-only: # touch /home/jenkins-deploy/foobar touch: cannot touch `foobar': Read-only file system # The mount point is: projectstorage.pmtpa.wmnet:/integration-home on /home type fuse.glusterfs (rw,default_permissions,allow_other,max_read=131072) /var/log/glusterfs/home.log shows a bunch of: remote operation failed: Transport endpoint is not connected On Sunday 9 Feb at 6:40 we had: [2014-02-09 06:41:01.523691] W [socket.c:1512:__socket_proto_state_machine] 0-integration-home-client-0: reading from socket failed. Error (Transport endpoint is not connected), peer (10.0.0.41:24448) [2014-02-09 06:41:01.541383] I [client.c:2090:client_rpc_notify] 0-integration-home-client-0: disconnected [2014-02-09 06:41:13.483376] E [socket.c:1715:socket_connect_finish] 0-integration-home-client-0: connection to 10.0.0.41:24448 failed (Connection refused) [2014-02-09 06:47:17.486672] I [glusterfsd.c:889:reincarnate] 0-glusterfsd: Fetching the volume file from server... then we get message saying it lacks quorum: 0-integration-home-replicate-0: failing truncate due to lack of quorum
Although this is an import issue, it is not that much of a priority since the only impact was jenkins-deploy user not being writable by Jenkins jobs. I have mitigated that issue by moving jenkins-deploy homedir under /mnt (bug 61144) which also solves potential race condition with jobs on different instances attempting to write in a shared directory (/home/jenkins-deploy).
Fixed by the migration to EQIAD. We are now using NFS.