Last modified: 2014-05-06 15:15:58 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T63141, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 61141 - GlusterFS readonly on integration project
GlusterFS readonly on integration project
Status: RESOLVED FIXED
Product: Wikimedia Labs
Classification: Unclassified
General (Other open bugs)
unspecified
All All
: Low major
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-02-10 15:17 UTC by Antoine "hashar" Musso (WMF)
Modified: 2014-05-06 15:15 UTC (History)
4 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Antoine "hashar" Musso (WMF) 2014-02-10 15:17:40 UTC
On integration labs project, the GlusterFS volume for /home is screwed and is marked read-only:

  # touch /home/jenkins-deploy/foobar
  touch: cannot touch `foobar': Read-only file system
  #


The mount point is:

projectstorage.pmtpa.wmnet:/integration-home on /home type fuse.glusterfs (rw,default_permissions,allow_other,max_read=131072)



/var/log/glusterfs/home.log shows a bunch of:

 remote operation failed: Transport endpoint is not connected


On Sunday 9 Feb at 6:40 we had:

[2014-02-09 06:41:01.523691] W [socket.c:1512:__socket_proto_state_machine] 0-integration-home-client-0: reading from socket failed. Error (Transport endpoint is not connected), peer (10.0.0.41:24448)
[2014-02-09 06:41:01.541383] I [client.c:2090:client_rpc_notify] 0-integration-home-client-0: disconnected
[2014-02-09 06:41:13.483376] E [socket.c:1715:socket_connect_finish] 0-integration-home-client-0: connection to 10.0.0.41:24448 failed (Connection refused)
[2014-02-09 06:47:17.486672] I [glusterfsd.c:889:reincarnate] 0-glusterfsd: Fetching the volume file from server...

then we get message saying it lacks quorum:

 0-integration-home-replicate-0: failing truncate due to lack of quorum
Comment 1 Antoine "hashar" Musso (WMF) 2014-02-10 17:08:04 UTC
Although this is an import issue, it is not that much of a priority since the only impact was jenkins-deploy user not being writable by Jenkins jobs.

I have mitigated that issue by moving jenkins-deploy homedir under /mnt (bug 61144) which also solves potential race condition with jobs on different instances attempting to write in a shared directory (/home/jenkins-deploy).
Comment 2 Antoine "hashar" Musso (WMF) 2014-05-06 15:15:58 UTC
Fixed by the migration to EQIAD. We are now using NFS.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links