Last modified: 2013-11-22 17:14:05 UTC
This morning we found out both beta and tools projects were "slow". The root cause is some jobs running on the solr project which exhaust the NFS server (labstore3) I/O operations. The workaround is to reboot the instance solr-mw2.pmtpa.wmflabs to disable the stressing job as instructed by Nikolas Everett http://lists.wikimedia.org/pipermail/labs-l/2013-July/001381.html The Solr experiment should be run on a different NFS system than the shared one. I guess a dedicated one.
Created attachment 12846 [details] network usage of solr project between 7/13/2013 8:00 and 7/13/2013 20:00
Created attachment 12847 [details] Ganglia CPU report of labstore 3 between 7/13/2013 8:00 and 7/13/2013 20:00
So in accordance with the above I rebooted solr-mw2 and load on labstore3 looks much better.
I should let everyone know what I'm doing: I'm loading a copy of enwiki with all the current text (as of some backup) so I can index it. No historical revisions as we won't be indexing them. I've been told that I can't use a prod replica because it won't contain any text. What is actually killing the NFS server is mysqld, not solr, elasticsearch, or any other new system. I make no claims that those systems wouldn't put a similar load on nfs at some point in the future though. I'd run this on my local system then I wouldn't be able to properly interact with other systems in labs that I need for the experiment.
Closing this since Elastic Search has been deployed in production so I guess there is less need nowadays to load huge amount of data in labs instance.