Last modified: 2013-10-07 21:03:22 UTC
Right now CirrusSearch's in place reindexing is pretty slow. We're actually able to overwhelm our small Elasticsearch cluster using a single threaded single process in place reindex. So, we should be more efficient about these reindexes. I see two angles of attack: 1. Optimize the client config: store.throttle.max_bytes_per_sec and its brothers. 2. Optimize the process of in place reindexing: 2a. Populate the new index with no shard replicas - just masters - then add replicas. 2b. Raise the refresh_interval on the index to something big or turn it off all together. 2c. Other stuff? The optimizations in 2 are _probably_ not required for initial index builds as MediaWiki is our bottleneck there.
We could also increase performance by throwing more machines at the problem and using more shards. When we get our bigger cluster we'll probably do that as well.
Change 88131 had a related patch set uploaded by Manybubbles: Optimize in place reindexing. https://gerrit.wikimedia.org/r/88131
Change 88131 merged by jenkins-bot: Optimize in place reindexing. https://gerrit.wikimedia.org/r/88131