Last modified: 2012-07-14 03:44:29 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T36568, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 34568 - Allow selecting all namespaces on Special:UnreviewedPages
Allow selecting all namespaces on Special:UnreviewedPages
Status: RESOLVED LATER
Product: MediaWiki extensions
Classification: Unclassified
FlaggedRevs (Other open bugs)
unspecified
All All
: Unprioritized enhancement (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-02-21 17:46 UTC by Umherirrender
Modified: 2012-07-14 03:44 UTC (History)
3 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Umherirrender 2012-02-21 17:46:07 UTC
On the special page Special:UnreviewedPages is no way to select all namespaces. In combination with a category, this is sometimes necessary, because you can have different reviewable pages in a category. Please add a "(all)" to the namespace selector like the namespace selector on Special:PendingChanges. Thanks.
Comment 1 Aaron Schulz 2012-02-22 07:28:55 UTC
This is not a feature for performance reasons. There is no (page_title) index, only (page_namespace,page_title). We only want to query pages where page_namespace refers to a reviewable namespace, so we want to use the (page_namespace,page_title) index. If we want "all reviewable namespaces", we have to sort/page on (namespace,title) and not just (title) anymore to avoid sorting.

If we sorted/paged on page_id, an "(all)" option would be doable, though slower given that the "page_namespace is reviewable" condition would not use an index.
Comment 2 Umherirrender 2012-03-04 09:56:28 UTC
But on PendingChanges this condition is also used, but it makes no problem, because there is no group by.

Maybe a parent_id = 0 can help to get the first revision of a page, and than the MIN(rev_timestamp) is cheaper or not needed, because in most cases there is only one revision with parent_id = 0 for a page_id.
Comment 3 Aaron Schulz 2012-06-28 18:44:31 UTC
PendingChanges uses either the flaggedpages table (pending items are indexed) or the flaggedpages_pending (which is a denormalization) tables. Both of which work well for this purpose, and don't require scanning massive portions of the page table. The number of pending pages tends to be from zero to a few dozen-thousand. Filtering by namespaces is thus easy.

The only way to do this well is to page on page_id rather than page_title. It would be confusing to page differently based on whether an namespace is provided, so we would have to always page by page_id, which would be slower if I specify a namespace that's only gets a small portion of edits.

UnreviewedPages is really only useful for getting really old pages that still haven't been reviewed. It seems like NewPages would be more useful (filtered for unpatrolled). Maybe that can have a category selector?
Comment 4 Aaron Schulz 2012-07-14 03:35:45 UTC
(In reply to comment #3)
> PendingChanges uses either the flaggedpages table (pending items are indexed)
> or the flaggedpages_pending (which is a denormalization) tables. Both of which
> work well for this purpose, and don't require scanning massive portions of the
> page table. The number of pending pages tends to be from zero to a few
> dozen-thousand. Filtering by namespaces is thus easy.
> 
> The only way to do this well is to page on page_id rather than page_title. It
> would be confusing to page differently based on whether an namespace is
> provided, so we would have to always page by page_id, which would be slower if
> I specify a namespace that's only gets a small portion of edits.
> 
> UnreviewedPages is really only useful for getting really old pages that still
> haven't been reviewed. It seems like NewPages would be more useful (filtered
> for unpatrolled). Maybe that can have a category selector?

Another problem with paging on page_id is that it increases the average number of rows to be scanned when "oldest" is selected, since older pages are more likely to be reviewed (whereas paging on page_title is more random).

Another trick would be to still page on page_title but do a UNION query (like recentchanges) on all reviewable namespaces. Given how many rows this special page already has to scan, I'd be a bit hesitant to do that either, though it seems to still be fairly fast on dewiki.
Comment 5 Aaron Schulz 2012-07-14 03:44:29 UTC
Closing until there is more demand for this.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links