Last modified: 2013-03-12 09:03:42 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T31287, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 29287 - Implement autosuggest through ajax for CodeReview "path" input field
Implement autosuggest through ajax for CodeReview "path" input field
Status: RESOLVED WONTFIX
Product: MediaWiki extensions
Classification: Unclassified
CodeReview (Other open bugs)
unspecified
All All
: Low enhancement (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-06-06 18:03 UTC by Antoine "hashar" Musso (WMF)
Modified: 2013-03-12 09:03 UTC (History)
4 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Antoine "hashar" Musso (WMF) 2011-06-06 18:03:27 UTC
The Path filtering input box does not sugest any path.  A simple callback against the API could show a list of 4-5 paths to select from.  Much like the general search input box :)

Might need an update of the API to support searching for a path prefix.
Comment 1 Brion Vibber 2011-06-06 18:05:52 UTC
Oh hell yes that'd be awesome. :D

Ought to be able to do prefix matches against the records in the DB, but not sure how efficient the current indexes are for that usage.
Comment 2 Sam Reed (reedy) 2011-06-06 18:13:41 UTC
mysql> select count( distinct cp_path ) from mw_code_paths where cp_repo_id = 1
+---------------------------+
| count( distinct cp_path ) |
+---------------------------+
|                    141817 |
+---------------------------+
1 row in set (2.44 sec)


Would you like them all in one query? =D
Comment 3 Sam Reed (reedy) 2011-06-06 18:51:08 UTC
Adding

mysql> CREATE INDEX repo_path ON mw_code_paths (cp_repo_id, cp_path);
Query OK, 1177840 rows affected (10 min 54.57 sec)
Records: 1177840  Duplicates: 0  Warnings: 0

gives


mysql> describe select distinct cp_path from mw_code_paths where cp_repo_id = 1 AND cp_path LIKE '/trunk/%';
+----+-------------+---------------+------+-------------------+---------+---------+-------+--------+-------------------------------------------+
| id | select_type | table         | type | possible_keys     | key     | key_len | ref   | rows   | Extra                                     |
+----+-------------+---------------+------+-------------------+---------+---------+-------+--------+-------------------------------------------+
|  1 | SIMPLE      | mw_code_paths | ref  | PRIMARY,repo_path | PRIMARY | 4       | const | 635909 | Using where; Using index; Using temporary |
+----+-------------+---------------+------+-------------------+---------+---------+-------+--------+-------------------------------------------+
1 row in set (0.00 sec)


rather than


mysql> describe select distinct cp_path from mw_code_paths where cp_repo_id = 1 AND cp_path LIKE '/trunk/%';
+----+-------------+---------------+------+---------------+---------+---------+-------+--------+-------------------------------------------+
| id | select_type | table         | type | possible_keys | key     | key_len | ref   | rows   | Extra                                     |
+----+-------------+---------------+------+---------------+---------+---------+-------+--------+-------------------------------------------+
|  1 | SIMPLE      | mw_code_paths | ref  | PRIMARY       | PRIMARY | 4       | const | 657813 | Using where; Using index; Using temporary |
+----+-------------+---------------+------+---------------+---------+---------+-------+--------+-------------------------------------------+
1 row in set (0.00 sec)


It took 10 minutes to add the index, but seems possibly worthwhile, as we're saving 24,000 rows..


mysql> drop index repo_path on mw_code_paths;
Query OK, 1177840 rows affected (1 min 28.05 sec)
Records: 1177840  Duplicates: 0  Warnings: 0


Unless there's a better index we can have?


Either way it gives using temporary, which isn't nice...
Comment 4 Sam Reed (reedy) 2011-06-06 19:02:01 UTC
Basic api module added in r89588
Comment 5 Brion Vibber 2011-06-06 19:02:01 UTC
Those row counts are estimates to begin with, and the difference is pretty tiny (636k rows vs 658k rows? same order magnitude). I wouldn't assume that to help all by itself...

However it looks like the extension doesn't actually add a 'repo_path' key.

The primary key as currently defined consists of (cp_repo_id, cp_rev_id, cp_path).

cp_repo_id is a constant (within the given repository we're working with), but cp_rev_id will vary. If we're returning filtered results listed by time -- as we are for what you actually type in -- then that's pretty good. But if we're trying to let someone type an arbitrary path, we probably want to bias to the *path prefix* not to *recent revision ids*.

Searching based on an index built on (cp_repo_id, cp_path) will give more efficient path lookups, but probably isn't super efficient either because it'll have a huge number of duplicate (for our purposes) rows listing the same paths over and over.

We probably don't want the DB to go through all 600k+ matching rows building a DISTINCT set (which it might, or might not, be smart enough to optimize out).
Comment 6 Sam Reed (reedy) 2011-06-06 19:10:53 UTC
Index added in r89589
Comment 7 Antoine "hashar" Musso (WMF) 2011-06-07 13:37:14 UTC
In your queries at comment 3, both make use of PRIMARY.

The first query has:
 possible_keys:  PRIMARY,repo_path
 key          :  PRIMARY

The second query has:
 possible_keys:  PRIMARY
 key          :  PRIMARY

Maybe retry with a 'FORCE INDEX repo_path' clause?
Comment 8 Sam Reed (reedy) 2011-06-07 13:41:45 UTC
mysql> describe select distinct cp_path from mw_code_paths repo_path where cp_repo_id = 1 AND cp_path LIKE '/trunk/%'
+----+-------------+-----------+------+-------------------+---------+---------+-------+--------+-------------------------------------------+
| id | select_type | table     | type | possible_keys     | key     | key_len | ref   | rows   | Extra                                     |
+----+-------------+-----------+------+-------------------+---------+---------+-------+--------+-------------------------------------------+
|  1 | SIMPLE      | repo_path | ref  | PRIMARY,repo_path | PRIMARY | 4       | const | 640157 | Using where; Using index; Using temporary |
+----+-------------+-----------+------+-------------------+---------+---------+-------+--------+-------------------------------------------+
1 row in set (0.00 sec)

mysql> describe select distinct cp_path from mw_code_paths force index (repo_path) where cp_repo_id = 1 AND cp_path LIKE '/trunk/%'
+----+-------------+---------------+-------+---------------+-----------+---------+------+--------+--------------------------+
| id | select_type | table         | type  | possible_keys | key       | key_len | ref  | rows   | Extra                    |
+----+-------------+---------------+-------+---------------+-----------+---------+------+--------+--------------------------+
|  1 | SIMPLE      | mw_code_paths | range | repo_path     | repo_path | 261     | NULL | 640157 | Using where; Using index |
+----+-------------+---------------+-------+---------------+-----------+---------+------+--------+--------------------------+

Yay, that looks better as it kills the temporary. Will commit it! :)
Comment 9 Antoine "hashar" Musso (WMF) 2011-06-07 13:55:26 UTC
You still hit a lot of row (640k).  Notice the key length is 261!

Maybe the index should be created using a shorter length (will have to figure out a correct value).

Maybe 24 character long which maps to: 
'/trunk/phase3/includes/X'
'/trunk/extensions/XXXXXX'

code_paths (cp_repo_id(24), cp_path)

Or 35 characters to includes branches:
'/branches/REL1_18/phase3/includes/X'
'/branches/REL1_18/extensions/XXXXXX'

code_paths (cp_repo_id(35), cp_path)

Will have to play a bit with them or use this:

 echo 'Explain <QUERY HERE>' | sql | domas

:)
Comment 10 Antoine "hashar" Musso (WMF) 2011-06-07 14:30:49 UTC
The above index snippets should read as :

 code_paths (cp_repo_id, cp_path(24) )
 code_paths (cp_repo_id, cp_path(35) )
Comment 11 Roan Kattouw 2011-06-07 20:45:00 UTC
(In reply to comment #9)
> You still hit a lot of row (640k).  Notice the key length is 261!
> 
That's not really relevant, is it? It just means the fields covered by the key are 261 bytes in total.

> Maybe the index should be created using a shorter length (will have to figure
> out a correct value).
> 
> Maybe 24 character long which maps to: 
> '/trunk/phase3/includes/X'
> '/trunk/extensions/XXXXXX'
> 
> code_paths (cp_repo_id(24), cp_path)
> 
> Or 35 characters to includes branches:
> '/branches/REL1_18/phase3/includes/X'
> '/branches/REL1_18/extensions/XXXXXX'
> 
> code_paths (cp_repo_id(35), cp_path)
> 
You could do that to make the index take less space, yes, but AFAIK MySQL isn't very smart when dealing with partial indexes like those, and will refuse to use the entire index if it can't be used to satisfy something completely. For that reason, I'm pretty sure it won't use the index for ORDER BY; it might use it for LIKE '/trunk/%' provided the prefix fits in the index, but I'm not sure.
Comment 12 Antoine "hashar" Musso (WMF) 2011-07-26 18:17:11 UTC
r89645 added USE INDEX 'repo_path'
Comment 13 Antoine "hashar" Musso (WMF) 2011-11-10 10:06:35 UTC
lowering priority of CodeReview bugs I have opened or that are assigned to me.
Comment 14 Antoine "hashar" Musso (WMF) 2013-03-12 09:03:42 UTC
The CodeReview extension is abandoned. No point in implementing that feature.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links