Last modified: 2014-06-24 05:34:18 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T59285, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 57285 - ApiQueryRandom allows slow queries for nonexistant page_namespace
ApiQueryRandom allows slow queries for nonexistant page_namespace
Status: RESOLVED FIXED
Product: MediaWiki
Classification: Unclassified
Database (Other open bugs)
1.21.x
All All
: Normal normal (vote)
: ---
Assigned To: Nobody - You can work on this!
: performance
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-11-20 07:37 UTC by Sean Pringle
Modified: 2014-06-24 05:34 UTC (History)
2 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Sean Pringle 2013-11-20 07:37:32 UTC
Variations on the following query get sniped for running too long on enwiki slaves:

SELECT /* RandomPage::selectRandomPageFromDB */ page_title, page_namespace FROM `page` WHERE page_namespace = '446' AND page_is_redirect = '0' AND (page_random >= 0) ORDER BY page_random LIMIT 1;

The above is nice and fast if the page_namespace value exists, but turns into a massive index scan if not (like 446!).

ApiQueryRandom should check for a valid page_namespace first.
Comment 1 Sean Pringle 2013-11-20 07:54:50 UTC
Actually, I picked on ApiQueryRandom because it *seems* to be the source of the queries.

SpecialRandompage apparently already limits page_namespace values using MWNamespace::getContentNamespaces(), and nothing else in mediawiki-core seems to fit the profile. Perhaps a Developer can say more...
Comment 2 Sean Pringle 2013-12-20 06:22:29 UTC
The example query above had a FORCE INDEX (page_random) removed during testing:

mysql wmf db1043 root enwiki> explain SELECT  page_title, page_namespace FROM `page` force index (page_random) WHERE page_namespace = '446' AND page_is_redirect = '0' AND (page_random >= 0) ORDER BY page_random LIMIT 1\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: page
         type: range
possible_keys: page_random
          key: page_random
      key_len: 8
          ref: NULL
         rows: 15821483
        Extra: Using index condition; Using where
1 row in set (0.25 sec)

Simply removing the force allows invalid namespaces to be optimized away:

mysql wmf db1043 root enwiki> explain SELECT  page_title, page_namespace FROM `page` WHERE page_namespace = '446' AND page_is_redirect = '0' AND (page_random >= 0) ORDER BY page_random LIMIT 1\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: page
         type: ref
possible_keys: name_title,page_random,page_redirect_namespace_len
          key: name_title
      key_len: 4
          ref: const
         rows: 1
        Extra: Using where; Using filesort
1 row in set (0.25 sec)

This is actually a problem with both RandomPage::selectRandomPageFromDB and ApiQueryRandom::runQuery; both are showing up in slow logs with large index scans.

Seems like the FORCE INDEX (page_random) isn't necessary for the mediawiki pseudo-random approach to function. The second example above allows a filesort but this is much lower impact than the massive index scan. Consider removing the force so that more appropriate indexes can be chosen (or added).
Comment 3 Kunal Mehta (Legoktm) 2013-12-20 06:27:10 UTC
(In reply to comment #0)

> ApiQueryRandom should check for a valid page_namespace first.

Note that 446 is a valid page_namespace, it's "Education Program" (there's a handy chart on [[WP:NS]]). I wonder if this is related to bug 58324?
Comment 4 Sean Pringle 2013-12-20 06:39:53 UTC
Ah ok, thanks. I went on distinct enwiki page_namespace values. That makes removing the forced index the only solution.
Comment 5 Brad Jorsch 2013-12-20 16:55:41 UTC
Gerrit change #102880 fixed the "empty namespace" issue, so this should be closed unless you want to keep it open to address this comment left on that change:

> page_random may need some reindexing too, either (page_namespace, page_random)
> for best effect or (page_random, page_namespace) for lesser effect while still
> avoiding filesort and reusing exiting page_random index in case anything else
> still forces it.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links