Last modified: 2013-11-22 17:14:05 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T53350, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 51350 - solr stress out the NFS server
solr stress out the NFS server
Status: RESOLVED WORKSFORME
Product: Wikimedia Labs
Classification: Unclassified
Infrastructure (Other open bugs)
unspecified
All All
: Unprioritized normal
: ---
Assigned To: Ryan Lane
: performance
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-07-15 10:45 UTC by Antoine "hashar" Musso (WMF)
Modified: 2013-11-22 17:14 UTC (History)
4 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
network usage of solr project between 7/13/2013 8:00 and 7/13/2013 20:00 (14.21 KB, image/png)
2013-07-15 10:47 UTC, Antoine "hashar" Musso (WMF)
Details
Ganglia CPU report of labstore 3 between 7/13/2013 8:00 and 7/13/2013 20:00 (15.20 KB, image/png)
2013-07-15 10:47 UTC, Antoine "hashar" Musso (WMF)
Details

Description Antoine "hashar" Musso (WMF) 2013-07-15 10:45:37 UTC
This morning we found out both beta and tools projects were "slow". The root cause is some jobs running on the solr project which exhaust the NFS server (labstore3) I/O operations.

The workaround is to reboot the instance solr-mw2.pmtpa.wmflabs to disable the stressing job as instructed by Nikolas Everett http://lists.wikimedia.org/pipermail/labs-l/2013-July/001381.html


The Solr experiment should be run on a different NFS system than the shared one. I guess a dedicated one.
Comment 1 Antoine "hashar" Musso (WMF) 2013-07-15 10:47:24 UTC
Created attachment 12846 [details]
network usage of solr project between 7/13/2013 8:00 and 7/13/2013 20:00
Comment 2 Antoine "hashar" Musso (WMF) 2013-07-15 10:47:54 UTC
Created attachment 12847 [details]
Ganglia CPU report of labstore 3 between 7/13/2013 8:00 and 7/13/2013 20:00
Comment 3 Ariel T. Glenn 2013-07-15 11:15:54 UTC
So in accordance with the above I rebooted solr-mw2 and load on labstore3 looks much better.
Comment 4 Nik Everett 2013-07-15 13:33:09 UTC
I should let everyone know what I'm doing:

I'm loading a copy of enwiki with all the current text (as of some backup) so I can index it.  No historical revisions as we won't be indexing them.  I've been told that I can't use a prod replica because it won't contain any text.

What is actually killing the NFS server is mysqld, not solr, elasticsearch, or any other new system.  I make no claims that those systems wouldn't put a similar load on nfs at some point in the future though.

I'd run this on my local system then I wouldn't be able to properly interact with other systems in labs that I need for the experiment.
Comment 5 Antoine "hashar" Musso (WMF) 2013-11-22 17:14:05 UTC
Closing this since Elastic Search has been deployed in production so I guess there is less need nowadays to load huge amount of data in labs instance.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links