Last modified: 2012-03-29 13:14:31 UTC
Created attachment 10286 [details] patch for usercontribs Hi, This is a very small patch (10 lines including description and comments) for the "usercontribs" api, that adds the possibility to sort results by timestamp,user or user,timestamp (depending on a new param "mixed") instead of only user,timestamp. Default behavior is not modified if the param mixed is not given, thus we keep ascendant compatibility. Would it be possible to review and submit the patch? Thanks a lot!
This might have performance implications if the sort key is not indexed. Could you run the SELECT query with EXPLAIN in front of it, and post the output here?
Thanks for the patch! Also, you might be interested in getting a Git account so you can submit future patches more easily: https://labsconsole.wikimedia.org/wiki/Help:Access
Hi, thanks for your interest! The output of the explain select is exactly the same if I run with or without "mixed", are you sure it is the correct test for checking performance ? id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE page ALL PRIMARY NULL NULL NULL 1 Using temporary; Using filesort 1 SIMPLE revision range usertext_timestamp usertext_timestamp 257 NULL 2 Using where; Using join buffer (and 257 becomes 271 if there is a uccontinue condition) By the way, I deleted an important part of the patch when I cleared all the changes I made in the file, I just resubmitted - and triple checked that I didn't delete anything else - another one. @Sumana: The "developper" page said that Git will only work from tomorrow, otherwise I would have tried to use it instead of the instructions with SVN :) . I'll think about it, but maybe I will only be an occasionnal contributor to wikimedia
Created attachment 10289 [details] second patch, corrected
What about a parameter ucsort with the values "user" (as default) and "timestamp" like list=categorymembers with "sortkey" and "timestamp" has, instead of parameter "mixed"?
If there is something already existant like that it would be a good thing to do that yes, do you want me to adapt the patch or will the maintainer do it? (it's only a matter of minutes to do that...)
(In reply to comment #6) > If there is something already existant like that it would be a good thing to do > that yes, do you want me to adapt the patch or will the maintainer do it? (it's > only a matter of minutes to do that...) It would be good if you would since it makes your patch immediately more usable to us.
Created attachment 10292 [details] New ucsort key instead of mixed
No problem, just resubmitted it :) Since I didn't have any clue that any of you where maintainers or simple users I didn't know whether I reached the correct audience or not, and whether you where interested in the patch or not.
(In reply to comment #3) > Hi, thanks for your interest! > The output of the explain select is exactly the same if I run with or without > "mixed", are you sure it is the correct test for checking performance ? > > id select_type table type possible_keys key key_len ref > rows Extra > 1 SIMPLE page ALL PRIMARY NULL NULL NULL 1 Using > temporary; Using filesort > 1 SIMPLE revision range usertext_timestamp usertext_timestamp > 257 NULL 2 Using where; Using join buffer > > (and 257 becomes 271 if there is a uccontinue condition) > Yes, EXPLAIN is normally used to get an indication of the database performance. From the indices on the revision table it looks like this should work, but I think somebody who knows more about query performance should have a look at this.
Should I resubmit it to gerrit now that it is open or will it do there?
Immae, go ahead and submit your change to Git via Gerrit. Thanks!
https://gerrit.wikimedia.org/r/3825
(was that the correct adress I was supposed to give? It wasn't very clear in the documentation, certainly because of my not-so-good english)
(In reply to comment #10) > (In reply to comment #3) > > Hi, thanks for your interest! > > The output of the explain select is exactly the same if I run with or without > > "mixed", are you sure it is the correct test for checking performance ? > > > > id select_type table type possible_keys key key_len ref > > rows Extra > > 1 SIMPLE page ALL PRIMARY NULL NULL NULL 1 Using > > temporary; Using filesort > > 1 SIMPLE revision range usertext_timestamp usertext_timestamp > > 257 NULL 2 Using where; Using join buffer > > > > (and 257 becomes 271 if there is a uccontinue condition) > > > Yes, EXPLAIN is normally used to get an indication of the database performance. > From the indices on the revision table it looks like this should work, but I > think somebody who knows more about query performance should have a look at > this. Both "ALL" and "Using filesort" are red flags. Here are my explains from the toolserver: mysql> explain select page_namespace, page_title, rev_id, rev_timestamp from page, revision where page_id=rev_page and rev_user_text LIKE 'J%' order by rev_user_text, rev_timestamp limit 51 \G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: revision type: range possible_keys: PRIMARY,page_timestamp,usertext_timestamp key: usertext_timestamp key_len: 257 ref: NULL rows: 46784760 Extra: Using where; Using index *************************** 2. row *************************** id: 1 select_type: SIMPLE table: page type: eq_ref possible_keys: PRIMARY key: PRIMARY key_len: 4 ref: enwiki.revision.rev_page rows: 1 Extra: 2 rows in set (0.00 sec) mysql> explain select page_namespace, page_title, rev_id, rev_timestamp from page, revision where page_id=rev_page and rev_user_text LIKE 'J%' order by rev_timestamp, rev_user_text limit 51 \G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: revision type: range possible_keys: PRIMARY,page_timestamp,usertext_timestamp key: usertext_timestamp key_len: 257 ref: NULL rows: 46784760 Extra: Using where; Using index; Using filesort *************************** 2. row *************************** id: 1 select_type: SIMPLE table: page type: eq_ref possible_keys: PRIMARY key: PRIMARY key_len: 4 ref: enwiki.revision.rev_page rows: 1 Extra: 2 rows in set (0.00 sec) Here's what happened when I ran these queries on the toolserver: mysql> select page_namespace, page_title, rev_id, rev_timestamp from page, revision where page_id=rev_page and rev_user_text LIKE 'J%' order by rev_user_text, rev_timestamp limit 51; [...] 51 rows in set (0.00 sec) mysql> select page_namespace, page_title, rev_id, rev_timestamp from page, revision where page_id=rev_page and rev_user_text LIKE 'J%' order by rev_timestamp, rev_user_text limit 51; [after about a minute] ^CCtrl-C -- sending "KILL QUERY 9922979" to server ... Ctrl-C -- query aborted. ERROR 1028 (HY000): Sort aborted This makes sense when you consider how MySQL is executing this query: it fetches all rows whose user starts with a J (according to EXPLAIN there are approximately 46 million of these), then sorts those by (timestamp,user) using a slow filesort (unindexed sort). So yes, this patch has major performance implications. An API query like ucuserprefix=J&ucsort=timestamp would result in an SQL query that takes over a minute to execute.
Hi, Thanks for your explanations. Would it be the same if we use ucuser instead of ucuserprefix? (i.e. we forbid to use it on ucuserprefix?) Maybe the discussion should continue on gerrit since I reposted it there?
If I understood well what you said, the problem is when mysql tries to sort the result, which is much more smaller when there are only a limited number of users? How long would a query take in that last case?
(In reply to comment #16) > Hi, > Thanks for your explanations. > > Would it be the same if we use ucuser instead of ucuserprefix? (i.e. we forbid > to use it on ucuserprefix?) > > Maybe the discussion should continue on gerrit since I reposted it there? With a single user it would work, but then sorting by (user,timestamp) produces the same output as sorting by (timestamp,user) because the value of 'user' is always the same. As soon as you allow multiple users, you run into the slow query problem: the 500 most active users on enwiki probably account for a couple million edits.
Ok thanks. I'll try to close the "bugreport" if I manage to...