Last modified: 2012-08-10 23:15:57 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T39030, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 37030 - Reduce DB master load from Revision::newFromTitle
Reduce DB master load from Revision::newFromTitle
Status: RESOLVED FIXED
Product: MediaWiki
Classification: Unclassified
Database (Other open bugs)
1.20.x
All All
: Normal normal (vote)
: ---
Assigned To: Aaron Schulz
: performance, platformeng
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-05-22 18:52 UTC by Siebrand Mazeland
Modified: 2012-08-10 23:15 UTC (History)
9 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Siebrand Mazeland 2012-05-22 18:52:09 UTC
As reported by Asher Feldman: 73% of all queries to the enwiki master representing 58% of query execution time are of the one following type, which can be almost fully eliminated with caching:  SELECT  /* Revision::newFromTitle n.n.n.n */ page_latest FROM `page` WHERE page_namespace = '10' AND page_title = 'Cite_book' LIMIT 1
Comment 1 Aaron Schulz 2012-05-22 18:54:24 UTC
Why are these hitting the master? I guess more info will be needed on the callers. They should probably be audited.
Comment 2 Aaron Schulz 2012-05-22 21:36:45 UTC
(In reply to comment #1)
> Why are these hitting the master? I guess more info will be needed on the
> callers. They should probably be audited.

Looks like there is no master flag. What is probably happening is that new edits are being made, then this function is called, and the master is hit since the slaves don't have the data.
Comment 3 Asher Feldman 2012-05-22 22:06:36 UTC
Even if many of these queries could (or do) go to slaves, I don't think they should happen at all.  Memcached could be used quite effectively here if keys were actively set on revision save (or deleted on revision/page delete).
Comment 4 Aaron Schulz 2012-05-22 23:28:23 UTC
(In reply to comment #3)
> Even if many of these queries could (or do) go to slaves, I don't think they
> should happen at all.  Memcached could be used quite effectively here if keys
> were actively set on revision save (or deleted on revision/page delete).

I actually misread this. I was thinking of the revision fetch. That only falls back to the master. But the query being spammed in this report is the page_latest fetch, which is always from the master no matter what (on each page view...).
Comment 5 Rob Lanphier 2012-06-29 18:47:49 UTC
There's a draft in Gerrit that Aaron is working on now (private only because there's not yet a "public draft" feature in Gerrit).  Once Aaron gets this reviewed by Asher on the performance side, he'll publish a changeset for normal code review.
Comment 6 Asher Feldman 2012-08-01 19:24:04 UTC
It looks like the change in 1.20wmf8 that was supposed to shift some of these to slaves didn't have an impact.  In the last 24 hours, select page_latest from Revision::newFromTitle represents 72% of all enwiki master queries.  Around 1/3 of these are coming from job runners, but the other 2/3 are from web requests.
Comment 7 Asher Feldman 2012-08-01 19:30:39 UTC
I was just tracking queries to the enwiki master from my ip address while browsing wikipedia as a logged out user.  Loading https://en.wikipedia.org/wiki/The_Squire_of_Long_Hadley resulted in 19 Revision::newFromTitle SELECT queries to enwiki master, as follows.  Repeated reloads result in the last three selects every time.  I would guess that the first load required a parse, and that for this page there are three page_latest master queries for each pcache hit.

SELECT /* Revision::newFromTitle 98.210.186.194 */  page_latest  FROM `page`  WHERE page_namespace = '10' AND page_title = 'Imdb_title'  LIMIT 1  
SELECT /* Revision::newFromTitle 98.210.186.194 */  page_latest  FROM `page`  WHERE page_namespace = '10' AND page_title = 'IMDb_title'  LIMIT 1  
SELECT /* Revision::newFromTitle 98.210.186.194 */  page_latest  FROM `page`  WHERE page_namespace = '10' AND page_title = 'CinemaoftheUK'  LIMIT 1  
SELECT /* Revision::newFromTitle 98.210.186.194 */  page_latest  FROM `page`  WHERE page_namespace = '10' AND page_title = 'Cinema_of_the_United_Kingdom'  LIMIT 1  
SELECT /* Revision::newFromTitle 98.210.186.194 */  page_latest  FROM `page`  WHERE page_namespace = '10' AND page_title = 'Navbox'  LIMIT 1  
SELECT /* Revision::newFromTitle 98.210.186.194 */  page_latest  FROM `page`  WHERE page_namespace = '10' AND page_title = 'Flagicon'  LIMIT 1  
SELECT /* Revision::newFromTitle 98.210.186.194 */  page_latest  FROM `page`  WHERE page_namespace = '10' AND page_title = 'Country_data_United_Kingdom'  LIMIT 1  
SELECT /* Revision::newFromTitle 98.210.186.194 */  page_latest  FROM `page`  WHERE page_namespace = '10' AND page_title = 'Flagicon/core'  LIMIT 1  
SELECT /* Revision::newFromTitle 98.210.186.194 */  page_latest  FROM `page`  WHERE page_namespace = '10' AND page_title = 'Navbar'  LIMIT 1  
SELECT /* Revision::newFromTitle 98.210.186.194 */  page_latest  FROM `page`  WHERE page_namespace = '10' AND page_title = 'Transclude'  LIMIT 1  
SELECT /* Revision::newFromTitle 98.210.186.194 */  page_latest  FROM `page`  WHERE page_namespace = '10' AND page_title = 'Navbox_subgroup'  LIMIT 1  
SELECT /* Revision::newFromTitle 98.210.186.194 */  page_latest  FROM `page`  WHERE page_namespace = '10' AND page_title = 'Icon'  LIMIT 1  
SELECT /* Revision::newFromTitle 98.210.186.194 */  page_latest  FROM `page`  WHERE page_namespace = '10' AND page_title = '!'  LIMIT 1  
SELECT /* Revision::newFromTitle 98.210.186.194 */  page_latest  FROM `page`  WHERE page_namespace = '10' AND page_title = 'Sinclair_Hill'  LIMIT 1  
SELECT /* Revision::newFromTitle 98.210.186.194 */  page_latest  FROM `page`  WHERE page_namespace = '10' AND page_title = '1920s-UK-film-stub'  LIMIT 1  
SELECT /* Revision::newFromTitle 98.210.186.194 */  page_latest  FROM `page`  WHERE page_namespace = '10' AND page_title = 'Asbox'  LIMIT 1  
SELECT /* Revision::newFromTitle 98.210.186.194 */  page_latest  FROM `page`  WHERE page_namespace = '8' AND page_title = 'Gadget-ReferenceTooltips.js'  LIMIT 1  
SELECT /* Revision::newFromTitle 98.210.186.194 */  page_latest  FROM `page`  WHERE page_namespace = '8' AND page_title = 'Gadget-DRN-wizard-loader.js'  LIMIT 1  
SELECT /* Revision::newFromTitle 98.210.186.194 */  page_latest  FROM `page`  WHERE page_namespace = '8' AND page_title = 'Common.js/secure_new.js'  LIMIT 1
Comment 8 Aaron Schulz 2012-08-01 19:36:40 UTC
The jobs ones are refreshLinks and the webServer ones are from random parses. Both involve parsing.

Tim was afraid that if we use the slaves for parsing, then when a templates changes and refreshLinks jobs are enqueued, that when they run, they might see the old templates due to slave lag.

I wonder if we can have jobs can't start for a few seconds or that have a master position recorded so ChronologyProtector could be used instead of hitting the master for each template. Even without prefill caching, it seems like something can be done here.
Comment 9 Asher Feldman 2012-08-08 21:38:34 UTC
ChronologyProtector seems like it could be a good bet for moving these to slave reads... Row caching would be even nicer :)

What about these three:

SELECT /* Revision::newFromTitle 98.210.186.194 */  page_latest  FROM `page` 
WHERE page_namespace = '8' AND page_title = 'Gadget-ReferenceTooltips.js' 
LIMIT 1  
SELECT /* Revision::newFromTitle 98.210.186.194 */  page_latest  FROM `page` 
WHERE page_namespace = '8' AND page_title = 'Gadget-DRN-wizard-loader.js' 
LIMIT 1  
SELECT /* Revision::newFromTitle 98.210.186.194 */  page_latest  FROM `page` 
WHERE page_namespace = '8' AND page_title = 'Common.js/secure_new.js'  LIMIT 1

Which were being executed on every parsercache derived view of the above linked page?
Comment 10 Aaron Schulz 2012-08-09 01:06:21 UTC
https://gerrit.wikimedia.org/r/#/c/17379/ was already merged to master for this.
Comment 11 Aaron Schulz 2012-08-10 23:15:57 UTC
(In reply to comment #10)
> https://gerrit.wikimedia.org/r/#/c/17379/ was already merged to master for
> this.

Re-title and closed. Caching is still in the works, but is not needed to reduce the load here now.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links