Last modified: 2012-03-29 13:14:31 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T37349, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 35349 - usercontribs API: sort results by timestamp,user or user,timestamp
usercontribs API: sort results by timestamp,user or user,timestamp
Status: RESOLVED WONTFIX
Product: MediaWiki
Classification: Unclassified
API (Other open bugs)
unspecified
All All
: Unprioritized enhancement (vote)
: ---
Assigned To: Nobody - You can work on this!
: patch, patch-need-review
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-03-20 03:12 UTC by Immae
Modified: 2012-03-29 13:14 UTC (History)
8 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
patch for usercontribs (1.59 KB, patch)
2012-03-20 03:12 UTC, Immae
Details
second patch, corrected (2.38 KB, patch)
2012-03-20 15:27 UTC, Immae
Details
New ucsort key instead of mixed (2.60 KB, patch)
2012-03-20 23:35 UTC, Immae
Details

Description Immae 2012-03-20 03:12:21 UTC
Created attachment 10286 [details]
patch for usercontribs

Hi,
This is a very small patch (10 lines including description and comments) for the "usercontribs" api, that adds the possibility to sort results by timestamp,user or user,timestamp (depending on a new param "mixed") instead of only user,timestamp.

Default behavior is not modified if the param mixed is not given, thus we keep ascendant compatibility.

Would it be possible to review and submit the patch?

Thanks a lot!
Comment 1 Bryan Tong Minh 2012-03-20 08:50:54 UTC
This might have performance implications if the sort key is not indexed. Could you run the SELECT query with EXPLAIN in front of it, and post the output here?
Comment 2 Sumana Harihareswara 2012-03-20 12:49:12 UTC
Thanks for the patch! Also, you might be interested in getting a Git account so you can submit future patches more easily: https://labsconsole.wikimedia.org/wiki/Help:Access
Comment 3 Immae 2012-03-20 15:26:51 UTC
Hi, thanks for your interest!
The output of the explain select is exactly the same if I run with or without "mixed", are you sure it is the correct test for checking performance ?

id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
1	SIMPLE	page	ALL	PRIMARY	NULL	NULL	NULL	1	Using temporary; Using filesort
1	SIMPLE	revision	range	usertext_timestamp	usertext_timestamp	257	NULL	2	Using where; Using join buffer

(and 257 becomes 271 if there is a uccontinue condition)

By the way, I deleted an important part of the patch when I cleared all the changes I made in the file, I just resubmitted - and triple checked that I didn't delete anything else - another one.

@Sumana: The "developper" page said that Git will only work from tomorrow, otherwise I would have tried to use it instead of the instructions with SVN :) . I'll think about it, but maybe I will only be an occasionnal contributor to wikimedia
Comment 4 Immae 2012-03-20 15:27:51 UTC
Created attachment 10289 [details]
second patch, corrected
Comment 5 db [inactive,noenotif] 2012-03-20 20:26:38 UTC
What about a parameter ucsort with the values "user" (as default) and "timestamp" like list=categorymembers with "sortkey" and "timestamp" has, instead of parameter "mixed"?
Comment 6 Immae 2012-03-20 22:37:54 UTC
If there is something already existant like that it would be a good thing to do that yes, do you want me to adapt the patch or will the maintainer do it? (it's only a matter of minutes to do that...)
Comment 7 Mark A. Hershberger 2012-03-20 23:15:53 UTC
(In reply to comment #6)
> If there is something already existant like that it would be a good thing to do
> that yes, do you want me to adapt the patch or will the maintainer do it? (it's
> only a matter of minutes to do that...)

It would be good if you would since it makes your patch immediately more usable to us.
Comment 8 Immae 2012-03-20 23:35:48 UTC
Created attachment 10292 [details]
New ucsort key instead of mixed
Comment 9 Immae 2012-03-20 23:37:50 UTC
No problem, just resubmitted it :)

Since I didn't have any clue that any of you where maintainers or simple users I didn't know whether I reached the correct audience or not, and whether you where interested in the patch or not.
Comment 10 Bryan Tong Minh 2012-03-21 20:57:43 UTC
(In reply to comment #3)
> Hi, thanks for your interest!
> The output of the explain select is exactly the same if I run with or without
> "mixed", are you sure it is the correct test for checking performance ?
> 
> id    select_type    table    type    possible_keys    key    key_len    ref   
> rows    Extra
> 1    SIMPLE    page    ALL    PRIMARY    NULL    NULL    NULL    1    Using
> temporary; Using filesort
> 1    SIMPLE    revision    range    usertext_timestamp    usertext_timestamp   
> 257    NULL    2    Using where; Using join buffer
> 
> (and 257 becomes 271 if there is a uccontinue condition)
> 
Yes, EXPLAIN is normally used to get an indication of the database performance. 
From the indices on the revision table it looks like this should work, but I think somebody who knows more about query performance should have a look at this.
Comment 11 Immae 2012-03-24 23:50:19 UTC
Should I resubmit it to gerrit now that it is open or will it do there?
Comment 12 Sumana Harihareswara 2012-03-25 02:15:15 UTC
Immae, go ahead and submit your change to Git via Gerrit.  Thanks!
Comment 13 Immae 2012-03-27 20:54:58 UTC
https://gerrit.wikimedia.org/r/3825
Comment 14 Immae 2012-03-27 20:56:35 UTC
(was that the correct adress I was supposed to give? It wasn't very clear in the documentation, certainly because of my not-so-good english)
Comment 15 Roan Kattouw 2012-03-29 10:34:43 UTC
(In reply to comment #10)
> (In reply to comment #3)
> > Hi, thanks for your interest!
> > The output of the explain select is exactly the same if I run with or without
> > "mixed", are you sure it is the correct test for checking performance ?
> > 
> > id    select_type    table    type    possible_keys    key    key_len    ref   
> > rows    Extra
> > 1    SIMPLE    page    ALL    PRIMARY    NULL    NULL    NULL    1    Using
> > temporary; Using filesort
> > 1    SIMPLE    revision    range    usertext_timestamp    usertext_timestamp   
> > 257    NULL    2    Using where; Using join buffer
> > 
> > (and 257 becomes 271 if there is a uccontinue condition)
> > 
> Yes, EXPLAIN is normally used to get an indication of the database performance. 
> From the indices on the revision table it looks like this should work, but I
> think somebody who knows more about query performance should have a look at
> this.
Both "ALL" and "Using filesort" are red flags.

Here are my explains from the toolserver:


mysql> explain select page_namespace, page_title, rev_id, rev_timestamp from page, revision where page_id=rev_page and rev_user_text LIKE 'J%' order by rev_user_text, rev_timestamp limit 51 \G 
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: revision
         type: range
possible_keys: PRIMARY,page_timestamp,usertext_timestamp
          key: usertext_timestamp
      key_len: 257
          ref: NULL
         rows: 46784760
        Extra: Using where; Using index
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: page
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: enwiki.revision.rev_page
         rows: 1
        Extra: 
2 rows in set (0.00 sec)


mysql> explain select page_namespace, page_title, rev_id, rev_timestamp from page, revision where page_id=rev_page and rev_user_text LIKE 'J%' order by rev_timestamp, rev_user_text limit 51 \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: revision
         type: range
possible_keys: PRIMARY,page_timestamp,usertext_timestamp
          key: usertext_timestamp
      key_len: 257
          ref: NULL
         rows: 46784760
        Extra: Using where; Using index; Using filesort
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: page
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: enwiki.revision.rev_page
         rows: 1
        Extra: 
2 rows in set (0.00 sec)

Here's what happened when I ran these queries on the toolserver:

mysql> select page_namespace, page_title, rev_id, rev_timestamp from page, revision where page_id=rev_page and rev_user_text LIKE 'J%' order by rev_user_text, rev_timestamp limit 51;
[...]
51 rows in set (0.00 sec)

mysql> select page_namespace, page_title, rev_id, rev_timestamp from page, revision where page_id=rev_page and rev_user_text LIKE 'J%' order by rev_timestamp, rev_user_text limit 51;
[after about a minute]
^CCtrl-C -- sending "KILL QUERY 9922979" to server ...
Ctrl-C -- query aborted.
ERROR 1028 (HY000): Sort aborted

This makes sense when you consider how MySQL is executing this query: it fetches all rows whose user starts with a J (according to EXPLAIN there are approximately 46 million of these), then sorts those by (timestamp,user) using a slow filesort (unindexed sort).

So yes, this patch has major performance implications. An API query like ucuserprefix=J&ucsort=timestamp would result in an SQL query that takes over a minute to execute.
Comment 16 Immae 2012-03-29 10:43:48 UTC
Hi,
Thanks for your explanations.

Would it be the same if we use ucuser instead of ucuserprefix? (i.e. we forbid to use it on ucuserprefix?)

Maybe the discussion should continue on gerrit since I reposted it there?
Comment 17 Immae 2012-03-29 10:49:45 UTC
If I understood well what you said, the problem is when mysql tries to sort the result, which is much more smaller when there are only a limited number of users?
How long would a query take in that last case?
Comment 18 Roan Kattouw 2012-03-29 11:02:14 UTC
(In reply to comment #16)
> Hi,
> Thanks for your explanations.
> 
> Would it be the same if we use ucuser instead of ucuserprefix? (i.e. we forbid
> to use it on ucuserprefix?)
> 
> Maybe the discussion should continue on gerrit since I reposted it there?
With a single user it would work, but then sorting by (user,timestamp) produces the same output as sorting by (timestamp,user) because the value of 'user' is always the same. As soon as you allow multiple users, you run into the slow query problem: the 500 most active users on enwiki probably account for a couple million edits.
Comment 19 Immae 2012-03-29 11:05:41 UTC
Ok thanks.
I'll try to close the "bugreport" if I manage to...

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links