Last modified: 2014-09-16 08:15:45 UTC
When betalabs was completely down with all page and API requests giving a 404, browser tests failed, bug 70648. That's obviously expected, but the console log for failing tests ends with mw-api-siteinfo.py reporting an opaque json parsing error (below) which took developers time to translate into "site was down". The console log should contain "ERROR 404 response from http://en.wikipedia.beta.wmflabs.org/w/api.php , aggghhh danger!" The fix in jenkins.git bin/mw-api-siteinfo.py is to check the response object response = requests.get(mw_api_url, params=API_QUERY) for validity/errors and die with an informative failure before going on to siteinfo = json.loads(response.content) According to http://docs.python-requests.org/en/latest/user/quickstart/#response-status-codes , just adding response.raise_for_status() causes an exception4 when the API request fails. From bug 70648, console log for the failed job https://integration.wikimedia.org/ci/job/browsertests-Echo-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce/36/console contained: 03:37:02 + GEM_HOME=/mnt/jenkins-workspace/workspace/browsertests-Echo-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce/../gems 03:37:02 ++ /srv/deployment/integration/slave-scripts/bin/mw-api-siteinfo.py http://en.wikipedia.beta.wmflabs.org/w/api.php git_branch 03:37:02 Traceback (most recent call last): 03:37:02 File "/srv/deployment/integration/slave-scripts/bin/mw-api-siteinfo.py", line 90, in <module> 03:37:02 main() 03:37:02 File "/srv/deployment/integration/slave-scripts/bin/mw-api-siteinfo.py", line 78, in main 03:37:02 siteinfo = json.loads(response.content) 03:37:02 File "/usr/lib/python2.7/json/__init__.py", line 326, in loads 03:37:02 return _default_decoder.decode(s) 03:37:02 File "/usr/lib/python2.7/json/decoder.py", line 369, in decode 03:37:02 raise ValueError(errmsg("Extra data", s, end, len(s))) 03:37:02 ValueError: Extra data: line 1 column 4 - line 1 column 18 (char 4 - 18)
It would be nice if the script would have helpful error messages, but I do not think this script's job is to check if beta is down. We should have a separate job that checks that.
(In reply to Željko Filipin from comment #1) > I do not think this script's job is to check if beta is down. I agree, but it should die when its request fails, not later when it json.loads() the bad response. I think my proposed 1-line fix will do this.
(In reply to spage from comment #2) > I agree, but it should die when its request fails, not later when it > json.loads() the bad response. I think my proposed 1-line fix will do this. +1
Watch out, we have python requests 0.8.2 which might not have response.raise_for_status()
- https://bugzilla.wikimedia.org/show_bug.cgi?id=51497 which is dealing with monitoring.
Change 160616 had a related patch set uploaded by Hashar: mw-api-siteinfo: raise on http error https://gerrit.wikimedia.org/r/160616
Change 160616 merged by jenkins-bot: mw-api-siteinfo: raise on http error https://gerrit.wikimedia.org/r/160616
Excellent finding S! I have copy pasted your one-liner fix and made you an Author: :] Thank you!