Last modified: 2014-05-11 20:05:35 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T58840, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 56840 - Special:Allpages is too slow!
Special:Allpages is too slow!
Status: RESOLVED FIXED
Product: MediaWiki
Classification: Unclassified
Special pages (Other open bugs)
unspecified
All All
: Highest normal (vote)
: ---
Assigned To: Nobody - You can work on this!
: performance
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-11-09 15:19 UTC by C. Scott Ananian
Modified: 2014-05-11 20:05 UTC (History)
10 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description C. Scott Ananian 2013-11-09 15:19:08 UTC
The following query was pulled from ishmael:

/* SpecialAllpages::showToplevel */ select page_title from `page` where page_namespace = ? and page_is_redirect = ? and (page_title >= ?) and (page_title >= ?) order by page_title limit ?

Real sample:

SELECT /* SpecialAllpages::showToplevel 108.35.125.176 */  page_title  FROM `page`   WHERE page_namespace = '0' AND page_is_redirect = '0' AND (page_title >= '1887') AND (page_title >= 'Centennial_Trail_State_Park')  ORDER BY page_title ASC LIMIT 86786,2

Hundreds of concurrent instances of this have been seen. They touched ~9M rows each, took over 100s, and had problematic clauses like that amazing LIMIT. They are not duplicates, having different page_title values and limit offsets (yet always a limit count of 2).

Furthermore batches of this query have been increasing in frequency.

The query itself comes from:
http://git.wikimedia.org/blob/mediawiki%2Fcore.git/ceba5987b0d36f134ffcd9e4f704d1608fe88b79/includes%2Fspecials%2FSpecialAllpages.php#L225
including that wonderful 'limit ...,2'.

It has this wonderful comment (at http://git.wikimedia.org/blob/mediawiki%2Fcore.git/ceba5987b0d36f134ffcd9e4f704d1608fe88b79/includes%2Fspecials%2FSpecialAllpages.php#L175):

# TODO: Either make this *much* faster or cache the title index points
# in the querycache table.

...so that should probably be done.

My best guess is that some third-party crawler is trying to obtain the titles of all articles by paging through Special:Allpages --- and, as we found out the hard way --- generating the list that was is O(N^3) where N is how many pages into the list you are.

If we "cach[ing] the title index points" we can get the complexity down to O(N^2).  Which is still bad, but an improvement.

To make the page request (amortized) O(1) we need to generate the index points for the entire list of articles in a single traversal of all the articles (possibly done in chunks, possibly done by a background task).
Comment 1 Faidon Liambotis 2013-11-09 15:37:58 UTC
Urgency highest, as this is actively producing problems on production right now and, in fact, the above query was an outlier in yesterday's incident investigation.
Comment 2 MZMcBride 2013-11-09 15:42:30 UTC
(In reply to comment #0)
> My best guess is that some third-party crawler is trying to obtain the titles
> of all articles by paging through Special:Allpages --- and, as we found out
> the hard way --- generating the list that was is O(N^3) where N is how many
> pages into the list you are.

No crawler or similarly automated process should be hitting index.php?title=Special:AllPages. MediaWiki has a robust Web API at api.php?action=query&list=allpages.

If a particular crawler or two are disrupting production, blocking those crawlers by IP or User-Agent would be the best temporary fix. Though obviously we'll need to evaluate and address the underlying query issue as well.
Comment 3 Nik Everett 2013-11-09 19:08:27 UTC
You can probably make the query a lot faster by removing the offset and adding a WHERE page_title > $last_title$.  It changes the feature slightly but iterating using something unique that you are sorted on and have indexed is generally pretty quick and prevents the duplicates you can get with OFFSET clauses.  I think this is worth doing even if we don't want stuff crawling the end point because it closes up a vector of attack.
Comment 4 Sean Pringle 2013-11-11 01:35:12 UTC
After the outage mentioned in comment #1, there are temporary pt-kill jobs running on S1 slaves to snipe this query if it runs over 30s or appears in batches. This is to keep the slaves alive.

If people as well as crawlers get affected, that's why.
Comment 5 Tim Starling 2013-11-11 02:21:16 UTC
The hierarchical contents list feature is a cute nostalgic reference to the idea of volumes of paper encyclopedias, but it is difficult to implement efficiently. I think it's doubtful that anyone derives a significant benefit from it on a wiki the size of en.wikipedia.org. We could just replace it with a simple AlphabeticPager, like the one on Special:ListUsers.

That is, unless someone has an idea of how to make this truly efficient, like O(1) in the total number of articles on the wiki? Surely we would set the bar that high if we were introducing the feature today.
Comment 6 Gerrit Notification Bot 2013-11-11 02:44:26 UTC
Change 94690 had a related patch set uploaded by Tim Starling:
In Special:AllPages, limit the size of hierarchical lists

https://gerrit.wikimedia.org/r/94690
Comment 7 Gerrit Notification Bot 2013-11-13 00:07:58 UTC
Change 94690 merged by jenkins-bot:
In Special:AllPages, limit the size of hierarchical lists

https://gerrit.wikimedia.org/r/94690
Comment 8 Ori Livneh 2013-11-13 00:38:13 UTC
Tim's patch resolves the immediate operational issue. I moved the feature discussion to wikitech-l: <http://lists.wikimedia.org/pipermail/wikitech-l/2013-November/073052.html>.
Comment 9 Gerrit Notification Bot 2013-11-13 01:01:20 UTC
Change 95084 had a related patch set uploaded by Ori.livneh:
In Special:AllPages, limit the size of hierarchical lists

https://gerrit.wikimedia.org/r/95084
Comment 10 Gerrit Notification Bot 2013-11-13 01:01:52 UTC
Change 95085 had a related patch set uploaded by Ori.livneh:
In Special:AllPages, limit the size of hierarchical lists

https://gerrit.wikimedia.org/r/95085
Comment 11 Gerrit Notification Bot 2013-11-13 01:12:14 UTC
Change 95084 merged by jenkins-bot:
In Special:AllPages, limit the size of hierarchical lists

https://gerrit.wikimedia.org/r/95084
Comment 12 Gerrit Notification Bot 2013-11-13 01:13:56 UTC
Change 95085 merged by jenkins-bot:
In Special:AllPages, limit the size of hierarchical lists

https://gerrit.wikimedia.org/r/95085
Comment 13 Andre Klapper 2013-11-13 11:39:04 UTC
Don't see any remaining open patches here. Reclosing.
Comment 14 Gabriel Wicke 2013-11-13 20:48:18 UTC
Out of curiosity, do we actually know why the section query seems to touch all pages? From what I can tell the offset is always <# of pages in wiki> / 100, and the second >= page_title clause is incremented on each iteration. Maybe the MySQL optimizer is not clever enough to figure out that it can drop the first page_title clause:

(page_title >= '1887') AND (page_title >= 'Centennial_Trail_State_Park')

If that is the case, then changing the query to only use the second clause should bring the complexity down to O(N) per page for O(N^2) total.
Comment 15 C. Scott Ananian 2013-11-13 22:53:49 UTC
I think it's actually the OFFSET clause (in this case, the first element of the LIMIT tuple) which is slowing things down.  There isn't an effective index that can quickly give you the "86786th article after Centennial_Trail_State_Park".  So instead it's sequentially scanning through 86,786 entries (and the offset grows the further you go).
Comment 16 MZMcBride 2013-11-14 00:39:41 UTC
I'm curious how this seemingly suddenly became a problem. Was this is a MariaDB performance regression?
Comment 17 Tim Starling 2013-11-14 00:48:22 UTC
(In reply to comment #16)
> I'm curious how this seemingly suddenly became a problem. Was this is a
> MariaDB
> performance regression?

See the original report:

> My best guess is that some third-party crawler is trying to obtain the titles
> of all articles by paging through Special:Allpages --- and, as we found out
> the
> hard way --- generating the list that was is O(N^3) where N is how many pages
> into the list you are.
Comment 18 Gabriel Wicke 2013-11-14 01:32:30 UTC
(In reply to comment #15)
> I think it's actually the OFFSET clause (in this case, the first element of
> the
> LIMIT tuple) which is slowing things down.  There isn't an effective index
> that
> can quickly give you the "86786th article after
> Centennial_Trail_State_Park". 
> So instead it's sequentially scanning through 86,786 entries 

Yeah, that's 1/100 of 8.6 million pages. Very close to the nine million rows from the original report.

> (and the offset grows the further you go).

Are we sure that this is the case? My reading of the code suggests that it remains constant per wiki.
Comment 19 C. Scott Ananian 2013-11-14 15:45:07 UTC
(In reply to comment #18)
> (In reply to comment #15)
> > (and the offset grows the further you go).
> 
> Are we sure that this is the case? My reading of the code suggests that it
> remains constant per wiki.

Re-reading, you're right.  Still O(N^3) to fetch all titles from a wiki with N titles because the offset, although constant, is O(N).  O(N) 'pages' * O(N) 'queries per page' * O(N) 'rows touched per query due to the offset'.

(Hopefully $dbr->estimateRowCount() executes in O(1) time!)
Comment 20 Nik Everett 2013-11-14 15:52:35 UTC
(In reply to comment #19)
> (Hopefully $dbr->estimateRowCount() executes in O(1) time!)

For MySQL, PostgreSQL, and MSSQL it is O(1) because it uses an explain.  Looks like Sqllite and Oracle do it with a COUNT(*) though.  I know Oracle hides its explain output behind obtuse views that you may or may not have access to.  I'm not sure about Sqllite.
Comment 21 Gabriel Wicke 2013-11-19 19:30:50 UTC
(In reply to comment #19)
> (In reply to comment #18)
> > (In reply to comment #15)
> > > (and the offset grows the further you go).
> > 
> > Are we sure that this is the case? My reading of the code suggests that it
> > remains constant per wiki.
> 
> Re-reading, you're right.  Still O(N^3) to fetch all titles from a wiki with
> N
> titles because the offset, although constant, is O(N).  O(N) 'pages' * O(N)
> 'queries per page' * O(N) 'rows touched per query due to the offset'.

O(N/100) (titles per section) times 100 (sections) is still O(N), so O(N^2) total. Note that the number of sections is constant, not O(N).

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links