Last modified: 2014-05-13 12:57:03 UTC
Accessing <https://www.wikidata.org/wiki/Special:Search> while logged out (anonymously), I'm occasionally getting 10-second-plus render times. This is a blank search input form. It should load much faster. Examples: $ curl -s "https://www.wikidata.org/wiki/Special:Search" | grep "Served by" <!-- Served by srv261 in 0.110 secs. --> $ curl -s "https://www.wikidata.org/wiki/Special:Search" | grep "Served by" <!-- Served by mw33 in 0.116 secs. --> $ curl -s "https://www.wikidata.org/wiki/Special:Search" | grep "Served by" <!-- Served by srv210 in 0.121 secs. --> $ curl -s "https://www.wikidata.org/wiki/Special:Search" | grep "Served by" <!-- Served by srv210 in 9.637 secs. --> $ curl -s "https://www.wikidata.org/wiki/Special:Search" | grep "Served by" <!-- Served by srv207 in 10.127 secs. --> $ curl -s "https://www.wikidata.org/wiki/Special:Search" | grep "Served by" <!-- Served by srv237 in 3.108 secs. --> $ curl -s "https://www.wikidata.org/wiki/Special:Search" | grep "Served by" <!-- Served by mw38 in 0.147 secs. --> $ curl -s "https://www.wikidata.org/wiki/Special:Search" | grep "Served by" <!-- Served by mw38 in 10.154 secs. -->
I'm starting to suspect this might be related to bug 42423, not wikidata.org. I'm getting intermittent slow responses from other sites as well. Example: $ curl -s "https://hi.wikipedia.org/wiki/%E0%A4%B5%E0%A4%BF%E0%A4%B6%E0%A5%87%E0%A4%B7:%E0%A4%96%E0%A5%8B%E0%A4%9C" | grep "Served by" <!-- Served by srv212 in 10.141 secs. --> This is hi.wikipedia.org's version of Special:Search, accessed anonymously via curl.
Hmm, and another. This really needs to be investigated. $ curl -s "https://hi.wikipedia.org/wiki/%E0%A4%B5%E0%A4%BF%E0%A4%B6%E0%A5%87%E0%A4%B7:%E0%A4%96%E0%A5%8B%E0%A4%9C" | grep "Served by" <!-- Served by mw59 in 10.130 secs. --> As far as I'm aware, a server taking over ten seconds to serve an anonymous page view is always wrong.
The application servers for [[Special:Search]] are randomly slow (like more than 5 seconds on some requests). Maybe due to some backend server slowness ?
gdash show up some latency spikes http://gdash.wikimedia.org/dashboards/searchlatency/
Are the same queries slow with Cirrus? If not I'm inclined to say this is cirrus-fixed and not worry about it. (In reply to Antoine "hashar" Musso from comment #4) > gdash show up some latency spikes > http://gdash.wikimedia.org/dashboards/searchlatency/ Getting those metrics updated for Elasticsearch is a good idea imho.
With Lucene, I guess we did a simple round robin load balancing. The high latency queries were probably hitting a single overloaded search server. I have no idea how ElasticSearch handles load balancing of requests across search servers. It might be using a smarter load balancing algorithm. The dashboard at http://gdash.wikimedia.org/dashboards/searchlatency/ is defined by operations/puppet.git in files/gdash/dashboards/searchlatency/ files. The graphs are based on the statsd metric MediaWiki.LuceneSearchSet.newFromQuery. It should be fairly easy to add similar graphs if the ElasticSearch extension provides a similar wfProfileIn() call.