Last modified: 2014-09-12 21:03:22 UTC
All the Echo and Flow tests failed starting 5 hours ago. The ones I've looked at failed in a few seconds. The build console log shows an error in mw-api-siteinfo.py parsing some json response, shown below. I think the real problem is http://en.wikipedia.beta.wmflabs.org/w/api.php is returning a 404. (If so, we ought to fix the script to report this instead of going on to report json decode errors!) https://integration.wikimedia.org/ci/job/browsertests-Echo-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce/36/console 03:37:02 + GEM_HOME=/mnt/jenkins-workspace/workspace/browsertests-Echo-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce/../gems 03:37:02 ++ /srv/deployment/integration/slave-scripts/bin/mw-api-siteinfo.py http://en.wikipedia.beta.wmflabs.org/w/api.php git_branch 03:37:02 Traceback (most recent call last): 03:37:02 File "/srv/deployment/integration/slave-scripts/bin/mw-api-siteinfo.py", line 90, in <module> 03:37:02 main() 03:37:02 File "/srv/deployment/integration/slave-scripts/bin/mw-api-siteinfo.py", line 78, in main 03:37:02 siteinfo = json.loads(response.content) 03:37:02 File "/usr/lib/python2.7/json/__init__.py", line 326, in loads 03:37:02 return _default_decoder.decode(s) 03:37:02 File "/usr/lib/python2.7/json/decoder.py", line 369, in decode 03:37:02 raise ValueError(errmsg("Extra data", s, end, len(s))) 03:37:02 ValueError: Extra data: line 1 column 4 - line 1 column 18 (char 4 - 18) 03:37:02 + MEDIAWIKI_GIT_BRANCH= 03:37:02 Build step 'Execute shell' marked build as failure 03:37:02 Recording test results 03:37:02 IRC notifier plugin: Sending notification to: #wikimedia-qa The CirrusSearch browser test, 3 minutes later, passed. But I just chose "Build now" and the test ran quickly. BTW, "Sending notification to: #wikimedia-qa" didn't result in any output in the IRC channel.
Everything on http://en.wikipedia.beta.wmflabs.org/ is a 404, index.php, load.php as well. jeremyb noticed /srv/mediawiki on the deployment machines is pretty empty. Gerrit change #159431 "beta: switch to /srv/mediawiki" was merged today, says "I made /srv/mediawiki be a symlink to /srv/common-local." The latter has all the expected files in it, and the former *isn't* a symlink. So maybe a puppet change didn't make it out, possibly related to bug 70597.
mw-api-siteinfo.py is in the repository integration/jenkins.git and should probably have better error handling with human friendly messages :D
I think beta labs is working now (thanks to the work of jeremyb and others), though I think this warrants a post-mortem incident report. I made bug 70695 for a clearer failure message.
See: https://wikitech.wikimedia.org/wiki/Incident_documentation/20140910-BetaCluster