Last modified: 2014-09-08 10:47:02 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T64528, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 62528 - Flow: don't perform reads on DB_MASTER
Flow: don't perform reads on DB_MASTER
Status: RESOLVED WONTFIX
Product: MediaWiki extensions
Classification: Unclassified
Flow (Other open bugs)
unspecified
All All
: Normal normal (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-03-11 13:17 UTC by Matthias Mullie
Modified: 2014-09-08 10:47 UTC (History)
4 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Matthias Mullie 2014-03-11 13:17:23 UTC
There's a couple of places where we perform read queries against DB_MASTER. We shouldn't be doing that, it kind of defeats the purpose of having slave DBs anyway, if we keep directing reads to the same master server, and it may eventually get overloaded.

Breaking down the problem:

* we have to make sure no (potentially) lagged slave data is cached.

We only cache DB-hits, not DB-misses. That's good, this means that if a lagged DB says some data is not there, that won't be saved to cache where it could persist for 3 more days. Instead, as soon as slave DBs have caught up, the most recent changes will appear.

However, there may be cases where we've *updated* existing DB data. In that case, it may still be possible for a lagged DB server to respond with (outdated) data that will be stored in cache until it expires.

* we have to make sure chronology is respected.

Core has ChronologyProtector in place to make sure that every consecutive request is always served by a DB at least as recent as the previous request. This already works on external DBs too (https://gerrit.wikimedia.org/r/#/c/58691/)

* we have to make sure same-request queries don't query lagged slaves after data was written to master.

ChronologyProtector only kicks in at the beginning of a request, not for every query. So when we're performing writes to DB_MASTER, we should make sure that any follow-up read query to DB_SLAVE is to a slave that is at least in the same state DB_MASTER is in.

In AFTv5, I've fixed this by checking in the own getDB() wrapper. If $db is DB_MASTER, I 
mark some variable as true, and make sure that any follow-up DB_SLAVE call is converted to DB_MASTER if that variable was set to true. This auto-direct all (and only those) calls to DB_SLAVE if we've just written new data to master.
That code is at: https://git.wikimedia.org/blob/mediawiki%2Fextensions%2FArticleFeedbackv5/60ea34d65207b5b71a6a617ff0a2129326ac9442/data%2FDataModelBackend.LBFactory.php#L36

We should address #3, and make sure that #1 can never happen.

Anything else I missed?
Comment 1 Matthias Mullie 2014-03-11 13:24:24 UTC
And, obviously, after those issues have been fixed, we should actually change all getDB( DB_MASTER ) calls for read-purposes to DB_SLAVE ;)
Comment 2 Matthias Mullie 2014-09-08 10:47:02 UTC
I think we've got this pretty much covered ATM. Let's close this and file new bugs should specific issues arise later.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links