Last modified: 2012-01-18 14:42:02 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T35540, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 33540 - Cleanup old "pagelinks" records which reference deleted pages in the Portuguese Wiktionary
Cleanup old "pagelinks" records which reference deleted pages in the Portugue...
Status: RESOLVED FIXED
Product: Wikimedia
Classification: Unclassified
Site requests (Other open bugs)
unspecified
All All
: Normal normal (vote)
: ---
Assigned To: Nobody - You can work on this!
: shell
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-01-05 15:05 UTC by Malafaya
Modified: 2012-01-18 14:42 UTC (History)
1 user (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
CSV of deleted page ids (7.19 KB, application/vnd.ms-excel)
2012-01-17 18:12 UTC, Malafaya
Details

Description Malafaya 2012-01-05 15:05:28 UTC
Currently, there are more than 1000 page links which reference non-existing pages in the Portuguese Wiktionary:


select distinct pl_from from pagelinks left join page on
page_id=pl_from where page_title is null;

1056 rows fetched


Some of the deleted page ids are: 2447, 3364, 3605

Please, delete these erroneous records. They are influencing manual "wanted page" generation.
Comment 1 Sam Reed (reedy) 2012-01-13 00:42:11 UTC
mysql> explain select distinct pl_from from pagelinks left join page on page_id=pl_from where page_title is null
+----+-------------+-----------+--------+---------------+---------+---------+--------------------------+----------+-----------------------------------+
| id | select_type | table     | type   | possible_keys | key     | key_len | ref                      | rows     | Extra                             |
+----+-------------+-----------+--------+---------------+---------+---------+--------------------------+----------+-----------------------------------+
|  1 | SIMPLE      | pagelinks | index  | NULL          | pl_from | 265     | NULL                     | 65287380 | Using index; Using temporary      |
|  1 | SIMPLE      | page      | eq_ref | PRIMARY       | PRIMARY | 4       | ptwiki.pagelinks.pl_from |        1 | Using where; Not exists; Distinct |
+----+-------------+-----------+--------+---------------+---------+---------+--------------------------+----------+-----------------------------------+
2 rows in set (0.00 sec)

Can you give me a CSV list of page ids? And I can just do a delete where in list, as I'm not running that query on the cluster
Comment 2 Malafaya 2012-01-17 18:12:44 UTC
Created attachment 9860 [details]
CSV of deleted page ids

Deleted pages ids
Comment 3 Malafaya 2012-01-17 18:14:13 UTC
The query takes about 7 seconds to run on my desktop computer, but here it is, a CSV file attached. Please double check that the SQL query I used yields what is intended. Thanks.
Comment 4 Sam Reed (reedy) 2012-01-17 23:24:27 UTC
Done
Comment 5 Malafaya 2012-01-18 14:42:02 UTC
Thanks!

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links