Last modified: 2013-03-12 09:03:42 UTC
The Path filtering input box does not sugest any path. A simple callback against the API could show a list of 4-5 paths to select from. Much like the general search input box :) Might need an update of the API to support searching for a path prefix.
Oh hell yes that'd be awesome. :D Ought to be able to do prefix matches against the records in the DB, but not sure how efficient the current indexes are for that usage.
mysql> select count( distinct cp_path ) from mw_code_paths where cp_repo_id = 1 +---------------------------+ | count( distinct cp_path ) | +---------------------------+ | 141817 | +---------------------------+ 1 row in set (2.44 sec) Would you like them all in one query? =D
Adding mysql> CREATE INDEX repo_path ON mw_code_paths (cp_repo_id, cp_path); Query OK, 1177840 rows affected (10 min 54.57 sec) Records: 1177840 Duplicates: 0 Warnings: 0 gives mysql> describe select distinct cp_path from mw_code_paths where cp_repo_id = 1 AND cp_path LIKE '/trunk/%'; +----+-------------+---------------+------+-------------------+---------+---------+-------+--------+-------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+---------------+------+-------------------+---------+---------+-------+--------+-------------------------------------------+ | 1 | SIMPLE | mw_code_paths | ref | PRIMARY,repo_path | PRIMARY | 4 | const | 635909 | Using where; Using index; Using temporary | +----+-------------+---------------+------+-------------------+---------+---------+-------+--------+-------------------------------------------+ 1 row in set (0.00 sec) rather than mysql> describe select distinct cp_path from mw_code_paths where cp_repo_id = 1 AND cp_path LIKE '/trunk/%'; +----+-------------+---------------+------+---------------+---------+---------+-------+--------+-------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+---------------+------+---------------+---------+---------+-------+--------+-------------------------------------------+ | 1 | SIMPLE | mw_code_paths | ref | PRIMARY | PRIMARY | 4 | const | 657813 | Using where; Using index; Using temporary | +----+-------------+---------------+------+---------------+---------+---------+-------+--------+-------------------------------------------+ 1 row in set (0.00 sec) It took 10 minutes to add the index, but seems possibly worthwhile, as we're saving 24,000 rows.. mysql> drop index repo_path on mw_code_paths; Query OK, 1177840 rows affected (1 min 28.05 sec) Records: 1177840 Duplicates: 0 Warnings: 0 Unless there's a better index we can have? Either way it gives using temporary, which isn't nice...
Basic api module added in r89588
Those row counts are estimates to begin with, and the difference is pretty tiny (636k rows vs 658k rows? same order magnitude). I wouldn't assume that to help all by itself... However it looks like the extension doesn't actually add a 'repo_path' key. The primary key as currently defined consists of (cp_repo_id, cp_rev_id, cp_path). cp_repo_id is a constant (within the given repository we're working with), but cp_rev_id will vary. If we're returning filtered results listed by time -- as we are for what you actually type in -- then that's pretty good. But if we're trying to let someone type an arbitrary path, we probably want to bias to the *path prefix* not to *recent revision ids*. Searching based on an index built on (cp_repo_id, cp_path) will give more efficient path lookups, but probably isn't super efficient either because it'll have a huge number of duplicate (for our purposes) rows listing the same paths over and over. We probably don't want the DB to go through all 600k+ matching rows building a DISTINCT set (which it might, or might not, be smart enough to optimize out).
Index added in r89589
In your queries at comment 3, both make use of PRIMARY. The first query has: possible_keys: PRIMARY,repo_path key : PRIMARY The second query has: possible_keys: PRIMARY key : PRIMARY Maybe retry with a 'FORCE INDEX repo_path' clause?
mysql> describe select distinct cp_path from mw_code_paths repo_path where cp_repo_id = 1 AND cp_path LIKE '/trunk/%' +----+-------------+-----------+------+-------------------+---------+---------+-------+--------+-------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-----------+------+-------------------+---------+---------+-------+--------+-------------------------------------------+ | 1 | SIMPLE | repo_path | ref | PRIMARY,repo_path | PRIMARY | 4 | const | 640157 | Using where; Using index; Using temporary | +----+-------------+-----------+------+-------------------+---------+---------+-------+--------+-------------------------------------------+ 1 row in set (0.00 sec) mysql> describe select distinct cp_path from mw_code_paths force index (repo_path) where cp_repo_id = 1 AND cp_path LIKE '/trunk/%' +----+-------------+---------------+-------+---------------+-----------+---------+------+--------+--------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+---------------+-------+---------------+-----------+---------+------+--------+--------------------------+ | 1 | SIMPLE | mw_code_paths | range | repo_path | repo_path | 261 | NULL | 640157 | Using where; Using index | +----+-------------+---------------+-------+---------------+-----------+---------+------+--------+--------------------------+ Yay, that looks better as it kills the temporary. Will commit it! :)
You still hit a lot of row (640k). Notice the key length is 261! Maybe the index should be created using a shorter length (will have to figure out a correct value). Maybe 24 character long which maps to: '/trunk/phase3/includes/X' '/trunk/extensions/XXXXXX' code_paths (cp_repo_id(24), cp_path) Or 35 characters to includes branches: '/branches/REL1_18/phase3/includes/X' '/branches/REL1_18/extensions/XXXXXX' code_paths (cp_repo_id(35), cp_path) Will have to play a bit with them or use this: echo 'Explain <QUERY HERE>' | sql | domas :)
The above index snippets should read as : code_paths (cp_repo_id, cp_path(24) ) code_paths (cp_repo_id, cp_path(35) )
(In reply to comment #9) > You still hit a lot of row (640k). Notice the key length is 261! > That's not really relevant, is it? It just means the fields covered by the key are 261 bytes in total. > Maybe the index should be created using a shorter length (will have to figure > out a correct value). > > Maybe 24 character long which maps to: > '/trunk/phase3/includes/X' > '/trunk/extensions/XXXXXX' > > code_paths (cp_repo_id(24), cp_path) > > Or 35 characters to includes branches: > '/branches/REL1_18/phase3/includes/X' > '/branches/REL1_18/extensions/XXXXXX' > > code_paths (cp_repo_id(35), cp_path) > You could do that to make the index take less space, yes, but AFAIK MySQL isn't very smart when dealing with partial indexes like those, and will refuse to use the entire index if it can't be used to satisfy something completely. For that reason, I'm pretty sure it won't use the index for ORDER BY; it might use it for LIKE '/trunk/%' provided the prefix fits in the index, but I'm not sure.
r89645 added USE INDEX 'repo_path'
lowering priority of CodeReview bugs I have opened or that are assigned to me.
The CodeReview extension is abandoned. No point in implementing that feature.