Last modified: 2012-01-18 14:42:02 UTC
Currently, there are more than 1000 page links which reference non-existing pages in the Portuguese Wiktionary: select distinct pl_from from pagelinks left join page on page_id=pl_from where page_title is null; 1056 rows fetched Some of the deleted page ids are: 2447, 3364, 3605 Please, delete these erroneous records. They are influencing manual "wanted page" generation.
mysql> explain select distinct pl_from from pagelinks left join page on page_id=pl_from where page_title is null +----+-------------+-----------+--------+---------------+---------+---------+--------------------------+----------+-----------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-----------+--------+---------------+---------+---------+--------------------------+----------+-----------------------------------+ | 1 | SIMPLE | pagelinks | index | NULL | pl_from | 265 | NULL | 65287380 | Using index; Using temporary | | 1 | SIMPLE | page | eq_ref | PRIMARY | PRIMARY | 4 | ptwiki.pagelinks.pl_from | 1 | Using where; Not exists; Distinct | +----+-------------+-----------+--------+---------------+---------+---------+--------------------------+----------+-----------------------------------+ 2 rows in set (0.00 sec) Can you give me a CSV list of page ids? And I can just do a delete where in list, as I'm not running that query on the cluster
Created attachment 9860 [details] CSV of deleted page ids Deleted pages ids
The query takes about 7 seconds to run on my desktop computer, but here it is, a CSV file attached. Please double check that the SQL query I used yields what is intended. Thanks.
Done
Thanks!