Last modified: 2013-09-17 11:17:02 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T56143, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 54143 - GlusterFS appears to be down (Transport endpoint is not connected, All subvolumes are down)
GlusterFS appears to be down (Transport endpoint is not connected, All subvol...
Status: RESOLVED FIXED
Product: Wikimedia Labs
Classification: Unclassified
Infrastructure (Other open bugs)
unspecified
All All
: Unprioritized critical
: ---
Assigned To: Ryan Lane
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-09-15 07:57 UTC by Nemo
Modified: 2013-09-17 11:17 UTC (History)
3 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
dumps-1.pmtpa.wmflabs:/var/log/glusterfs/data-project.log (2.33 MB, text/x-log)
2013-09-15 07:58 UTC, Nemo
Details

Description Nemo 2013-09-15 07:57:00 UTC
Around 6.45 UTC, /data/project disappeared for all instances in dumps project with error: Transport endpoint is not connected.

I've followed the steps in https://wikitech.wikimedia.org/wiki/Help:Shared_storage#Troubleshooting including reboot but it seems the error is persistent and/or not related to my instance:

[2013-09-15 07:41:38.598177] E [socket.c:1715:socket_connect_finish] 0-dumps-project-client-0: connection to 10.0.0.41:24007 failed (Connection refused)
[2013-09-15 07:41:38.598968] E [socket.c:1715:socket_connect_finish] 0-dumps-project-client-1: connection to 10.0.0.42:24007 failed (Connection refused)
[2013-09-15 07:41:38.599003] E [afr-common.c:3665:afr_notify] 0-dumps-project-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
[2013-09-15 07:41:38.604444] I [fuse-bridge.c:4191:fuse_graph_setup] 0-fuse: switched to graph 0
[2013-09-15 07:41:38.604886] I [fuse-bridge.c:3376:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.13 kernel 7.17
[2013-09-15 07:41:38.605293] W [fuse-bridge.c:513:fuse_attr_cbk] 0-glusterfs-fuse: 2: LOOKUP() / => -1 (Transport endpoint is not connected)

By the way, while I investigated what command to use to properly mount a volume, I found this comment which suggests we shouldn't use Ubuntu's packages but the new ones from <https://launchpad.net/~semiosis/+archive/ubuntu-glusterfs-3.3>: <http://unix-heaven.org/comment/1854#comment-1854>. They are supposed to solve some issues we have.
Comment 1 Nemo 2013-09-15 07:58:28 UTC
Created attachment 13286 [details]
dumps-1.pmtpa.wmflabs:/var/log/glusterfs/data-project.log
Comment 2 Andrew Bogott 2013-09-15 20:11:38 UTC
Is this failure happening in a particular project, or in /all/ projects?
Comment 3 Nemo 2013-09-15 20:17:02 UTC
(In reply to comment #2)
> Is this failure happening in a particular project, or in /all/ projects?

I've asked on the labs-l mailing list but didn't get an answer. I also don't remember if I have access to other projects nor how to find a list of projects I have access to.
Comment 4 Andrew Bogott 2013-09-15 20:17:32 UTC
Oh, sorry, you said 'dumps'.  Should be fixed -- please close this bug if you can confirm.
Comment 5 Nemo 2013-09-15 20:20:58 UTC
Yes! Thank you so much. :D
Comment 6 Nemo 2013-09-17 11:17:02 UTC
For the records, on the instance that I hadn't rebooted glusterfs has been extremely slow for a while, apparently till labstore1 and labstore2 were very busy (in terms of network and CPU) communicating with each other and with the instance's glusterfs process. (Extremely slow as in ls on a directory with 50 files taking many minutes.)

Now I'm also seeing weird errors like this file which exists but doesn't exist at the same time, but I expect a reboot will fix it:
$ ls 2011/2011-10-01.csv
2011/2011-10-01.csv
nemobis@dumps-2:/data/project/commonsgrab2$ ls /data/project/commonsgrab2/2011/2011-10-01.csv
ls: cannot access /data/project/commonsgrab2/2011/2011-10-01.csv: Input/output error
nemobis@dumps-2:/data/project/commonsgrab2$ stat /data/project/commonsgrab2/2011/2011-10-01.csv
stat: cannot stat `/data/project/commonsgrab2/2011/2011-10-01.csv': Input/output error

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links