Last modified: 2014-07-25 15:01:05 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T67486, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 65486 - beta labs mysteriously goes read-only overnight
beta labs mysteriously goes read-only overnight
Status: RESOLVED FIXED
Product: Wikimedia Labs
Classification: Unclassified
deployment-prep (beta) (Other open bugs)
unspecified
All All
: Unprioritized normal
: ---
Assigned To: Nobody - You can work on this!
: browser-test-bug
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-05-19 14:28 UTC by Chris McMahon
Modified: 2014-07-25 15:01 UTC (History)
11 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
Screenshot (227.62 KB, image/png)
2014-06-03 18:41 UTC, Rummana Yasmeen
Details

Description Chris McMahon 2014-05-19 14:28:53 UTC
I've been seeing this in the overnight runs of the browser tests in recent times. The build for VisualEditor will fail with a modal dialog that says "Error loading data from server: readonly. The wiki is currently in read-only mode. Would you like to retry?"

Here is an example from the overnight run Sunday 18 May: https://wmf.ci.cloudbees.com/job/VisualEditor-en.wikipedia.beta.wmflabs.org-linux-chrome/512/testReport/(root)/VisualEditor/Edit_with_strings__outline_example_____Editing_with_%C3%84%C3%8B%C3%8F%C3%96%C3%9C___Editing_with_%C3%84%C3%8B%C3%8F%C3%96%C3%9C___/

I can't think of any reason why beta labs would be in read-only mode late on a Sunday (PDT).  

I suspect this may also be the cause of the occasional failure in other builds with less information, for example "too many connection resets (due to Net::ReadTimeout - Net::ReadTimeout)" that we see in the MobileFrontend builds: too many connection resets (due to Net::ReadTimeout - Net::ReadTimeout) https://wmf.ci.cloudbees.com/job/MobileFrontend-en.m.wikipedia.beta.wmflabs.org-linux-firefox/571/testReport/(root)/Check%20UI%20components/Check_existence_of_important_UI_components_on_other_pages_/
Comment 2 Arthur Richards 2014-05-20 23:03:26 UTC
If I recall correctly, this is something that can happen when things go sideways with the database. Not sure if that's what's going on here, but may be worth looking into.
Comment 3 Andre Klapper 2014-05-26 15:13:49 UTC
Who could investigate this?
Comment 4 Antoine "hashar" Musso (WMF) 2014-05-26 20:34:51 UTC
On one SauceLab failure, it was POSTing to  "http://en.wikipedia.beta.wmflabs.org/wiki/User:Selenium_user/firefox?vehidebetadialog=true&veaction=edit"

The message:

  The wiki is currently in read-only mode. Would you like to retry?

Which comes from ApiBase::dieReadOnly(). That method seems to only be called when wfReadOnly() is true which is some legacy code that would let us create a file on the cluster that would disable edits entirely.

There is like 0% change it is being triggered that way unless something mess with $wgReadOnly.  So most probably the i18n message is being reused by another path of code.
Comment 5 James Forrester 2014-05-27 21:19:55 UTC
FWIW VisualEditor doesn't know about <readonlytext> – it's just passing on what it gets from the API.
Comment 6 Chris McMahon 2014-05-27 21:27:41 UTC
Right, and we know about the unexpected readonly status because it seems only VisualEditor displays that error in a javascript confirm modal dialog.  It might manifest in other ways that we would not see if not for the modal dialog that stops the test.
Comment 7 Rummana Yasmeen 2014-06-03 18:41:46 UTC
Created attachment 15558 [details]
Screenshot

I have reproduced this issue today on Betalabs, attaching the screenshot
Comment 8 Chris McMahon 2014-06-03 19:27:27 UTC
Antoine, would these messages be relevant?  They do not seem to happen at any particular interval but they might be correlated to the time at which Rummana saw the problem.  

@deployment-bastion:/data/project/logs$ tail -f dberror.log 
Tue Jun 3 17:17:09 UTC 2014	deployment-apache01	testwiki	Error connecting to 10.68.17.94: :real_connect(): (42000/1049): Unknown database 'testwikidatawiki'
Tue Jun 3 17:17:09 UTC 2014	deployment-apache01	testwiki	Connection error: No working slave server: Unknown error (10.68.17.94)
Tue Jun 3 17:17:09 UTC 2014	deployment-apache01	testwiki	Error connecting to 10.68.17.94: :real_connect(): (42000/1049): Unknown database 'testwikidatawiki'
Tue Jun 3 17:17:09 UTC 2014	deployment-apache01	testwiki	Connection error: No working slave server: Unknown error (10.68.17.94)
Tue Jun 3 17:17:09 UTC 2014	deployment-apache01	testwiki	Error connecting to 10.68.17.94: :real_connect(): (42000/1049): Unknown database 'testwikidatawiki'
Tue Jun 3 17:17:09 UTC 2014	deployment-apache01	testwiki	Connection error: No working slave server: Unknown error (10.68.17.94)
Tue Jun 3 17:50:48 UTC 2014	deployment-apache01	testwiki	Error connecting to 10.68.17.94: :real_connect(): (42000/1049): Unknown database 'testwikidatawiki'
Tue Jun 3 17:50:48 UTC 2014	deployment-apache01	testwiki	Connection error: No working slave server: Unknown error (10.68.17.94)
Tue Jun 3 19:20:48 UTC 2014	deployment-apache01	testwiki	Error connecting to 10.68.17.94: :real_connect(): (42000/1049): Unknown database 'testwikidatawiki'
Tue Jun 3 19:20:48 UTC 2014	deployment-apache01	testwiki	Connection error: No working slave server: Unknown error (10.68.17.94)
Comment 9 Chris McMahon 2014-06-03 20:18:53 UTC
I saw this just now also.
Comment 10 Chris McMahon 2014-07-23 21:36:05 UTC
Adding Sean Pringle.  This seems to be getting worse. I'd like to either update the db less often or else make it less disruptive.
Comment 12 Chris McMahon 2014-07-25 15:00:56 UTC
This seems to have been fixed by https://gerrit.wikimedia.org/r/#/c/149052/

Thanks Sam!

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links