Last modified: 2011-03-05 15:33:20 UTC
See example link, there were more commits under that path. Reproduceable on other paths, too.
As it turned out, it's on purpose, see CodeRevision::getPathConds(): // performance 'cp_rev_id > ' . ( $this->mRepo->getLastStoredRev() - 20000 ), Marking this as dependent on bug 24479 which should address DB performance.
mysql> describe SELECT cp_rev_id,cr_id,cr_repo_id,cr_status,COUNT(DISTINCT cc_id) AS comments,cr_path,cr_message,cr_author,cr_timestamp FROM `mw_code_paths` INNER JOIN `mw_code_rev` ON ((cr_repo_id = cp_repo_id AND cr_id = cp_rev_id)) LEFT JOIN `mw_code_comment` ON ((cc_repo_id = cp_repo_id AND cc_rev_id = cp_rev_id)) WHERE cp_repo_id = '3' AND (cp_path LIKE '/trunk/extensions/Cite/cite\_text%' ) AND (cp_rev_id > 58352) GROUP BY cp_rev_id ORDER BY cp_rev_id; +----+-------------+-----------------+-------+-----------------------------------------+------------+---------+----------------------------------------------------------+-------+----------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-----------------+-------+-----------------------------------------+------------+---------+----------------------------------------------------------+-------+----------------------------------------------+ | 1 | SIMPLE | mw_code_rev | range | PRIMARY,cr_repo_id,cr_repo_author,cr_id | PRIMARY | 8 | NULL | 21804 | Using where; Using temporary; Using filesort | | 1 | SIMPLE | mw_code_paths | ref | PRIMARY | PRIMARY | 8 | const,wikidb.mw_code_rev.cr_id | 2 | Using where; Using index | | 1 | SIMPLE | mw_code_comment | ref | cc_repo_id,cc_repo_time | cc_repo_id | 8 | wikidb.mw_code_paths.cp_repo_id,wikidb.mw_code_rev.cr_id | 1 | Using index | +----+-------------+-----------------+-------+-----------------------------------------+------------+---------+----------------------------------------------------------+-------+----------------------------------------------+ 3 rows in set (0.00 sec) mysql> describe SELECT cp_rev_id,cr_id,cr_repo_id,cr_status,COUNT(DISTINCT cc_id) AS comments,cr_path,cr_message,cr_author,cr_timestamp FROM `mw_code_paths` INNER JOIN `mw_code_rev` ON ((cr_repo_id = cp_repo_id AND cr_id = cp_rev_id)) LEFT JOIN `mw_code_comment` ON ((cc_repo_id = cp_repo_id AND cc_rev_id = cp_rev_id)) WHERE cp_repo_id = '3' AND (cp_path LIKE '/trunk/extensions/Cite/cite\_text%' ) GROUP BY cp_rev_id ORDER BY cp_rev_id; +----+-------------+-----------------+------+-----------------------------------------+------------+---------+----------------------------------------------------------+-------+---------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-----------------+------+-----------------------------------------+------------+---------+----------------------------------------------------------+-------+---------------------------------+ | 1 | SIMPLE | mw_code_rev | ref | PRIMARY,cr_repo_id,cr_repo_author,cr_id | PRIMARY | 4 | const | 49838 | Using temporary; Using filesort | | 1 | SIMPLE | mw_code_paths | ref | PRIMARY | PRIMARY | 8 | const,wikidb.mw_code_rev.cr_id | 2 | Using where; Using index | | 1 | SIMPLE | mw_code_comment | ref | cc_repo_id,cc_repo_time | cc_repo_id | 8 | wikidb.mw_code_paths.cp_repo_id,wikidb.mw_code_rev.cr_id | 1 | Using index | +----+-------------+-----------------+------+-----------------------------------------+------------+---------+----------------------------------------------------------+-------+---------------------------------+ 3 rows in set (0.00 sec)
As Roan suggests, this should be fixed by storing paths more redundantly, so the LIKE can be made into an equality check. Otherwise there's no way to get it to play nice with the grouping/ordering.
(In reply to comment #3) > As Roan suggests, this should be fixed by storing paths more redundantly, so > the LIKE can be made into an equality check. Otherwise there's no way to get > it to play nice with the grouping/ordering. Elaborating on my suggestion on IRC: Currently we only store the path of each changed file in the paths table. If instead we also stored the paths of all of their ancestors, we could simply do WHERE cp_path='/trunk/phase3' instead of WHERE cp_path LIKE '/trunk/phase3%' , which will search directories recursively. For example, for a commit that touches ApiQueryBase.php and MessagesEn.php, we'd store the following paths: /trunk/phase3/includes/api/ApiQueryBase.php /trunk/phase3/includes/api /trunk/phase3/includes /trunk/phase3/languages/messages/MessagesEn.php /trunk/phase3/languages/messages /trunk/phase3/languages /trunk/phase3 /trunk This would improve query performance at the expense of table size.
Interesting. We'd also need to presumably back populate this for the old revs... No database changes, just some more coding changes to split stuff down...
(In reply to comment #5) > Interesting. We'd also need to presumably back populate this for the old > revs... > Yes. I could do that locally and measure the size increase that way. > No database changes, just some more coding changes to split stuff down... Exactly.
r79379, r79381, r79382, r79384... Before: mysql> select count(*) from mw_code_paths WHERE cp_repo_id = 3 AND cp_rev_id > 1 AND cp_rev_id < 1000; +----------+ | count(*) | +----------+ | 2580 | +----------+ 1 row in set (0.02 sec) After: mysql> select count(*) from mw_code_paths WHERE cp_repo_id = 3 AND cp_rev_id > 1 AND cp_rev_id < 1000; +----------+ | count(*) | +----------+ | 6600 | +----------+ 1 row in set (0.00 sec) After the first 1000 done mysql> select count(*) from mw_code_paths WHERE cp_repo_id = 3; +----------+ | count(*) | +----------+ | 516313 | +----------+ 1 row in set (0.26 sec) After the whole repo (78352 locally) mysql> select count(*) from mw_code_paths WHERE cp_repo_id = 3; +----------+ | count(*) | +----------+ | 988825 | +----------+ 1 row in set (1.93 sec) Not really fast, but hardly slow either. Took a few minutes to execute