Last modified: 2012-08-10 23:15:57 UTC
As reported by Asher Feldman: 73% of all queries to the enwiki master representing 58% of query execution time are of the one following type, which can be almost fully eliminated with caching: SELECT /* Revision::newFromTitle n.n.n.n */ page_latest FROM `page` WHERE page_namespace = '10' AND page_title = 'Cite_book' LIMIT 1
Why are these hitting the master? I guess more info will be needed on the callers. They should probably be audited.
(In reply to comment #1) > Why are these hitting the master? I guess more info will be needed on the > callers. They should probably be audited. Looks like there is no master flag. What is probably happening is that new edits are being made, then this function is called, and the master is hit since the slaves don't have the data.
Even if many of these queries could (or do) go to slaves, I don't think they should happen at all. Memcached could be used quite effectively here if keys were actively set on revision save (or deleted on revision/page delete).
(In reply to comment #3) > Even if many of these queries could (or do) go to slaves, I don't think they > should happen at all. Memcached could be used quite effectively here if keys > were actively set on revision save (or deleted on revision/page delete). I actually misread this. I was thinking of the revision fetch. That only falls back to the master. But the query being spammed in this report is the page_latest fetch, which is always from the master no matter what (on each page view...).
There's a draft in Gerrit that Aaron is working on now (private only because there's not yet a "public draft" feature in Gerrit). Once Aaron gets this reviewed by Asher on the performance side, he'll publish a changeset for normal code review.
It looks like the change in 1.20wmf8 that was supposed to shift some of these to slaves didn't have an impact. In the last 24 hours, select page_latest from Revision::newFromTitle represents 72% of all enwiki master queries. Around 1/3 of these are coming from job runners, but the other 2/3 are from web requests.
I was just tracking queries to the enwiki master from my ip address while browsing wikipedia as a logged out user. Loading https://en.wikipedia.org/wiki/The_Squire_of_Long_Hadley resulted in 19 Revision::newFromTitle SELECT queries to enwiki master, as follows. Repeated reloads result in the last three selects every time. I would guess that the first load required a parse, and that for this page there are three page_latest master queries for each pcache hit. SELECT /* Revision::newFromTitle 98.210.186.194 */ page_latest FROM `page` WHERE page_namespace = '10' AND page_title = 'Imdb_title' LIMIT 1 SELECT /* Revision::newFromTitle 98.210.186.194 */ page_latest FROM `page` WHERE page_namespace = '10' AND page_title = 'IMDb_title' LIMIT 1 SELECT /* Revision::newFromTitle 98.210.186.194 */ page_latest FROM `page` WHERE page_namespace = '10' AND page_title = 'CinemaoftheUK' LIMIT 1 SELECT /* Revision::newFromTitle 98.210.186.194 */ page_latest FROM `page` WHERE page_namespace = '10' AND page_title = 'Cinema_of_the_United_Kingdom' LIMIT 1 SELECT /* Revision::newFromTitle 98.210.186.194 */ page_latest FROM `page` WHERE page_namespace = '10' AND page_title = 'Navbox' LIMIT 1 SELECT /* Revision::newFromTitle 98.210.186.194 */ page_latest FROM `page` WHERE page_namespace = '10' AND page_title = 'Flagicon' LIMIT 1 SELECT /* Revision::newFromTitle 98.210.186.194 */ page_latest FROM `page` WHERE page_namespace = '10' AND page_title = 'Country_data_United_Kingdom' LIMIT 1 SELECT /* Revision::newFromTitle 98.210.186.194 */ page_latest FROM `page` WHERE page_namespace = '10' AND page_title = 'Flagicon/core' LIMIT 1 SELECT /* Revision::newFromTitle 98.210.186.194 */ page_latest FROM `page` WHERE page_namespace = '10' AND page_title = 'Navbar' LIMIT 1 SELECT /* Revision::newFromTitle 98.210.186.194 */ page_latest FROM `page` WHERE page_namespace = '10' AND page_title = 'Transclude' LIMIT 1 SELECT /* Revision::newFromTitle 98.210.186.194 */ page_latest FROM `page` WHERE page_namespace = '10' AND page_title = 'Navbox_subgroup' LIMIT 1 SELECT /* Revision::newFromTitle 98.210.186.194 */ page_latest FROM `page` WHERE page_namespace = '10' AND page_title = 'Icon' LIMIT 1 SELECT /* Revision::newFromTitle 98.210.186.194 */ page_latest FROM `page` WHERE page_namespace = '10' AND page_title = '!' LIMIT 1 SELECT /* Revision::newFromTitle 98.210.186.194 */ page_latest FROM `page` WHERE page_namespace = '10' AND page_title = 'Sinclair_Hill' LIMIT 1 SELECT /* Revision::newFromTitle 98.210.186.194 */ page_latest FROM `page` WHERE page_namespace = '10' AND page_title = '1920s-UK-film-stub' LIMIT 1 SELECT /* Revision::newFromTitle 98.210.186.194 */ page_latest FROM `page` WHERE page_namespace = '10' AND page_title = 'Asbox' LIMIT 1 SELECT /* Revision::newFromTitle 98.210.186.194 */ page_latest FROM `page` WHERE page_namespace = '8' AND page_title = 'Gadget-ReferenceTooltips.js' LIMIT 1 SELECT /* Revision::newFromTitle 98.210.186.194 */ page_latest FROM `page` WHERE page_namespace = '8' AND page_title = 'Gadget-DRN-wizard-loader.js' LIMIT 1 SELECT /* Revision::newFromTitle 98.210.186.194 */ page_latest FROM `page` WHERE page_namespace = '8' AND page_title = 'Common.js/secure_new.js' LIMIT 1
The jobs ones are refreshLinks and the webServer ones are from random parses. Both involve parsing. Tim was afraid that if we use the slaves for parsing, then when a templates changes and refreshLinks jobs are enqueued, that when they run, they might see the old templates due to slave lag. I wonder if we can have jobs can't start for a few seconds or that have a master position recorded so ChronologyProtector could be used instead of hitting the master for each template. Even without prefill caching, it seems like something can be done here.
ChronologyProtector seems like it could be a good bet for moving these to slave reads... Row caching would be even nicer :) What about these three: SELECT /* Revision::newFromTitle 98.210.186.194 */ page_latest FROM `page` WHERE page_namespace = '8' AND page_title = 'Gadget-ReferenceTooltips.js' LIMIT 1 SELECT /* Revision::newFromTitle 98.210.186.194 */ page_latest FROM `page` WHERE page_namespace = '8' AND page_title = 'Gadget-DRN-wizard-loader.js' LIMIT 1 SELECT /* Revision::newFromTitle 98.210.186.194 */ page_latest FROM `page` WHERE page_namespace = '8' AND page_title = 'Common.js/secure_new.js' LIMIT 1 Which were being executed on every parsercache derived view of the above linked page?
https://gerrit.wikimedia.org/r/#/c/17379/ was already merged to master for this.
(In reply to comment #10) > https://gerrit.wikimedia.org/r/#/c/17379/ was already merged to master for > this. Re-title and closed. Caching is still in the works, but is not needed to reduce the load here now.