Last modified: 2014-07-08 19:42:36 UTC
The CirrusSearch tests use http://git.wikimedia.org/blob/mediawiki%2Fextensions%2FCirrusSearch.git/d71770acb38d10cc2ef482df1452d39e8be3dfa2/tests%2Fbrowser%2Farticles%2Frashidun_caliphate.txt as an example of a reasonably complex page. They import the wikitext for the page without importing any of the templates or requiring scribunto, both of which would be required to get it to look right. Anyway, the page takes many many seconds for hhvm to render in vagrant. So many that the request times out. Eventually enough of the page is cached and requests for it are fast.
I tried this again this morning and it seems like hhvm gets starved for threads somehow. Other requests start failing when this happens. At least, that is the case on vagrant. Also, with the job queue bouncing against hhvm I think this can also cause some starvation.
I'm also not sure now if the large page is the trouble or the starvation caused by having the large page hit the job queue as well.
I added timeouts to the end of the access logs. This look unhealthy: - - [08/Jul/2014:17:57:25 +0000] \”POST /w/index.php?title=Special%3ARunJobs&tasks=jobs&maxjobs=100&sigexpiry=1404841997&signature=83d9513fa827c7c762ba47fd28b9b2866266e8aa HTTP/1.1\” 202 - 252802314 That's 253ish seconds to run jobs. I'm dropping my jobrunrate from the stupid high 100 to 5 to see if that helps. I don't imagine it will, though, because I don't make that many jobs at once. Also, this: - - [08/Jul/2014:17:59:34 +0000] \”POST /wiki/../w/api.php HTTP/1.1\” 200 116 125473169 That is posting my large page, I believe. About 125 seconds.
Clicking the "edit" button on my large page took 28 seconds in browser but was logged as taking 8.6 in the access log.... Posting the page timed out at the apache level after two minutes and took 259 seconds for hhvm to finish. Off to the profiler.
Actually, its caused by instant commons, I believe. It looks like it tries to fetch everything as soon as the page is created. Tries over and over again.
Recreated with zend. Each http get for something like http://commons.wikimedia.org/w/api.php?titles=File%3AMohammad_adil-Muslim_conquest_of_Egypt.PNG&iiprop=timestamp%7Cuser%7Ccomment%7Curl%7Csize%7Csha1%7Cmetadata%7Cmime%7Cmediatype%7Cextmetadata&prop=imageinfo&iimetadataversion=2&iiextmetadatamultilang=1&format=json&action=query&redirects=true&uselang=en is taking 5.5 seconds.
Its a vagrant problem. Inside vagrant: $ time curl google.com <HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8"> <TITLE>301 Moved</TITLE></HEAD><BODY> <H1>301 Moved</H1> The document has moved <A HREF="http://www.google.com/">here</A>. </BODY></HTML> real 0m5.704s user 0m0.001s sys 0m0.030s vagrant@mediawiki-vagrant:/srv/images$ Outside vagrant: $ time curl google.com <HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8"> <TITLE>301 Moved</TITLE></HEAD><BODY> <H1>301 Moved</H1> The document has moved <A HREF="http://www.google.com/">here</A>. </BODY></HTML> real 0m0.183s user 0m0.014s sys 0m0.009s
Connection problems? Sounds like what I was hitting yesterday / this morning. Still don't have a solution short of reloading the VM and hoping.
Not connection problems - its dns resolution! I have a fix but it is somewhat mysterious.
Change 144738 had a related patch set uploaded by Manybubbles: Speed up dns resolution in some cases https://gerrit.wikimedia.org/r/144738
Possibly related to https://www.virtualbox.org/ticket/11663 and a lot of other VirtualBox host-only netowrking problems listed as see-alsos on that bug.
(In reply to Bryan Davis from comment #11) > Possibly related to https://www.virtualbox.org/ticket/11663 and a lot of > other VirtualBox host-only netowrking problems listed as see-alsos on that > bug. Sounds likely.
Change 144738 merged by jenkins-bot: Speed up dns resolution in some cases https://gerrit.wikimedia.org/r/144738
Don't forget to restart the VirtualBox guest after applying the patch: `vagrant reload` works well.