Last modified: 2011-10-12 22:12:49 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T33530, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 31530 - Intermittent "cannot contact the database server" on https://en.wikipedia.org/
Intermittent "cannot contact the database server" on https://en.wikipedia.org/
Status: RESOLVED FIXED
Product: Wikimedia
Classification: Unclassified
General/Unknown (Other open bugs)
unspecified
All All
: Highest normal (vote)
: ---
Assigned To: Nobody - You can work on this!
: ops
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-10-08 14:57 UTC by MZMcBride
Modified: 2011-10-12 22:12 UTC (History)
7 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description MZMcBride 2011-10-08 14:57:59 UTC
I've intermittently been getting "(Cannot contact the database server: Unknown error (10.0.6.42))" errors on <https://en.wikipedia.org>. This most recent time happened when trying to preview an edit. I don't edit much, but I've gotten the error a few times over the past week. It'd be nice if someone could check the frequency of such errors and examine what the underlying issue is.
Comment 1 Mark A. Hershberger 2011-10-08 19:59:04 UTC
http://rt.wikimedia.org/Ticket/Display.html?id=1684
Comment 2 Platonides 2011-10-08 20:02:21 UTC
Not that useful for ordinary people. But if there's a RT ticket saying something like "db32 randomly drops connections". Should this bug be closed?
Comment 3 MZMcBride 2011-10-09 02:21:21 UTC
(In reply to comment #2)
> Not that useful for ordinary people. But if there's a RT ticket saying
> something like "db32 randomly drops connections". Should this bug be closed?

Unless that RT ticket contains top-secret information, Bugzilla should always take precedence. Ops needs to get better about using RT only when absolutely necessary.
Comment 4 Ryan Lane 2011-10-09 02:23:33 UTC
No. Definitely don't close bugs if an RT is created. We are looking for better ways to update both ways. I'd prefer we have a public way of tracking info.
Comment 5 MZMcBride 2011-10-12 04:37:51 UTC
Just got "(Can't contact the database server: Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2) (localhost))" on https://en.wikipedia.org.
Comment 6 Mark A. Hershberger 2011-10-12 18:20:43 UTC
New reports here.  CCing Tim, Asher and bumping priority.
http://en.wikipedia.org/w/index.php?diff=455163422&oldid=455150842
Comment 7 Tim Starling 2011-10-12 18:33:46 UTC
(In reply to comment #5)
> Just got "(Can't contact the database server: Can't connect to local MySQL
> server through socket '/var/run/mysqld/mysqld.sock' (2) (localhost))" on
> https://en.wikipedia.org.

What was the URL? Was the error message inside a MediaWiki skin, or was it just a blank page with an error message? If the navigation elements were there, did they look normal, or was the site name incorrect?
Comment 8 Titoxd 2011-10-12 19:41:44 UTC
(In reply to comment #7)
> (In reply to comment #5)
> > Just got "(Can't contact the database server: Can't connect to local MySQL
> > server through socket '/var/run/mysqld/mysqld.sock' (2) (localhost))" on
> > https://en.wikipedia.org.
> 
> What was the URL? Was the error message inside a MediaWiki skin, or was it just
> a blank page with an error message? If the navigation elements were there, did
> they look normal, or was the site name incorrect?

I ran into the same error myself yesterday, but on http, not https. I found it when clicking on an internal link to http://en.wikipedia.org/wiki/2011_Pacific_hurricane_season. No MediaWiki skin was visible, just a white page with the localhost error message and a search bar. Unfortunately, I can't seem to replicate the problem consistently in any way.
Comment 9 Tim Starling 2011-10-12 20:34:35 UTC
When there's a connection error, a log entry is written by LoadBalancer, not Database. If an extension is creating its own Database objects with incorrect configuration, that would explain the lack of connection errors in dberror.log.
Comment 10 Tim Starling 2011-10-12 21:28:05 UTC
(In reply to comment #9)
> When there's a connection error, a log entry is written by LoadBalancer, not
> Database. If an extension is creating its own Database objects with incorrect
> configuration, that would explain the lack of connection errors in dberror.log.

Actually none of that is true. Maybe an extension could make these errors somehow, but I'm not sure how.
Comment 11 Platonides 2011-10-12 21:50:06 UTC
The linked diff say "the one I'm getting has a "(Can't contact the database server: Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2) (localhost))" on it" which would be very wrong. It would be trying to connect to a mysql server running in the apaches!
Comment 12 Asher Feldman 2011-10-12 22:12:00 UTC
A change to the job queue system in 1.18 to fix an issue where the job runners
were hammering the enwiki master resulted in a high number of locks triggering
this mysql bug - http://bugs.mysql.com/bug.php?id=49047 (thanks domas!) 

r99650 removes the lock issue and since deploying, haven't seen any connection
errors to db32.  I am going to build and package mysql 5.1.52@fb in the near
future which includes a fix for mysql 49047, after which we can try reverting
r99650.

Considering the cause and fix, it definitely seems that bugzilla was the correct place for this.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links