Last modified: 2014-06-24 05:34:18 UTC
Variations on the following query get sniped for running too long on enwiki slaves: SELECT /* RandomPage::selectRandomPageFromDB */ page_title, page_namespace FROM `page` WHERE page_namespace = '446' AND page_is_redirect = '0' AND (page_random >= 0) ORDER BY page_random LIMIT 1; The above is nice and fast if the page_namespace value exists, but turns into a massive index scan if not (like 446!). ApiQueryRandom should check for a valid page_namespace first.
Actually, I picked on ApiQueryRandom because it *seems* to be the source of the queries. SpecialRandompage apparently already limits page_namespace values using MWNamespace::getContentNamespaces(), and nothing else in mediawiki-core seems to fit the profile. Perhaps a Developer can say more...
The example query above had a FORCE INDEX (page_random) removed during testing: mysql wmf db1043 root enwiki> explain SELECT page_title, page_namespace FROM `page` force index (page_random) WHERE page_namespace = '446' AND page_is_redirect = '0' AND (page_random >= 0) ORDER BY page_random LIMIT 1\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: page type: range possible_keys: page_random key: page_random key_len: 8 ref: NULL rows: 15821483 Extra: Using index condition; Using where 1 row in set (0.25 sec) Simply removing the force allows invalid namespaces to be optimized away: mysql wmf db1043 root enwiki> explain SELECT page_title, page_namespace FROM `page` WHERE page_namespace = '446' AND page_is_redirect = '0' AND (page_random >= 0) ORDER BY page_random LIMIT 1\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: page type: ref possible_keys: name_title,page_random,page_redirect_namespace_len key: name_title key_len: 4 ref: const rows: 1 Extra: Using where; Using filesort 1 row in set (0.25 sec) This is actually a problem with both RandomPage::selectRandomPageFromDB and ApiQueryRandom::runQuery; both are showing up in slow logs with large index scans. Seems like the FORCE INDEX (page_random) isn't necessary for the mediawiki pseudo-random approach to function. The second example above allows a filesort but this is much lower impact than the massive index scan. Consider removing the force so that more appropriate indexes can be chosen (or added).
(In reply to comment #0) > ApiQueryRandom should check for a valid page_namespace first. Note that 446 is a valid page_namespace, it's "Education Program" (there's a handy chart on [[WP:NS]]). I wonder if this is related to bug 58324?
Ah ok, thanks. I went on distinct enwiki page_namespace values. That makes removing the forced index the only solution.
Gerrit change #102880 fixed the "empty namespace" issue, so this should be closed unless you want to keep it open to address this comment left on that change: > page_random may need some reindexing too, either (page_namespace, page_random) > for best effect or (page_random, page_namespace) for lesser effect while still > avoiding filesort and reusing exiting page_random index in case anything else > still forces it.