Last modified: 2014-02-28 16:21:15 UTC
For every source we have a main category, a tracker template and a tracker category. I'll take the Netherlands as an example: * Main Category: https://commons.wikimedia.org/wiki/Category:Rijksmonumenten * Tracker template: https://commons.wikimedia.org/wiki/Template:Rijksmonument * Tracker category: https://commons.wikimedia.org/wiki/Category:Rijksmonumenten_with_known_IDs Every image and category that is tagged with the tracker template should also be somewhere in the tree under the main category (excluding the tracker category). This can go down several steps: * https://commons.wikimedia.org/wiki/Category:Rijksmonumenten * https://commons.wikimedia.org/wiki/Category:Rijksmonumenten_in_North_Holland * https://commons.wikimedia.org/wiki/Category:Rijksmonumenten_in_Haarlem * https://commons.wikimedia.org/wiki/Category:Grote_Kerk,_Haarlem * <some image> The tool should report for which categories and for which images this is not the case so it can be fixed.
The query to get this is quite horrible: SELECT CONCAT('*[[:{{subst:ns:', tracked.page_namespace, '}}:', tracked.page_title, ']]') FROM page AS tracked JOIN categorylinks AS trackcat ON tracked.page_id=trackcat.cl_from WHERE (page_namespace=6 OR page_namespace=14) AND page_is_redirect=0 AND cl_to='Rijksmonumenten_with_known_IDs' AND NOT EXISTS( SELECT * FROM categorylinks AS c1 JOIN page AS p1 ON c1.cl_to=p1.page_title JOIN categorylinks AS c2 ON p1.page_id=c2.cl_from JOIN page AS p2 ON c2.cl_to=p2.page_title JOIN categorylinks AS c3 ON p2.page_id=c3.cl_from JOIN page AS p3 ON c3.cl_to=p3.page_title JOIN categorylinks AS c4 ON p3.page_id=c4.cl_from JOIN page AS p4 ON c4.cl_to=p4.page_title JOIN categorylinks AS c5 ON p4.page_id=c5.cl_from JOIN page AS p5 ON c5.cl_to=p5.page_title JOIN categorylinks AS c6 ON p5.page_id=c6.cl_from JOIN page AS p6 ON c6.cl_to=p6.page_title JOIN categorylinks AS c7 ON p6.page_id=c7.cl_from JOIN page AS p7 ON c7.cl_to=p7.page_title WHERE tracked.page_id=c1.cl_from AND NOT c1.cl_to='Rijksmonumenten_with_known_IDs' AND NOT p1.page_title='Rijksmonumenten_with_known_IDs' AND p1.page_namespace=14 AND p1.page_is_redirect=0 AND NOT c2.cl_to='Rijksmonumenten_with_known_IDs' AND NOT p2.page_title='Rijksmonumenten_with_known_IDs' AND p2.page_namespace=14 AND p2.page_is_redirect=0 AND NOT c3.cl_to='Rijksmonumenten_with_known_IDs' AND NOT p3.page_title='Rijksmonumenten_with_known_IDs' AND p3.page_namespace=14 AND p3.page_is_redirect=0 AND NOT c4.cl_to='Rijksmonumenten_with_known_IDs' AND NOT p4.page_title='Rijksmonumenten_with_known_IDs' AND p4.page_namespace=14 AND p4.page_is_redirect=0 AND NOT c5.cl_to='Rijksmonumenten_with_known_IDs' AND NOT p5.page_title='Rijksmonumenten_with_known_IDs' AND p5.page_namespace=14 AND p5.page_is_redirect=0 AND NOT c6.cl_to='Rijksmonumenten_with_known_IDs' AND NOT p6.page_title='Rijksmonumenten_with_known_IDs' AND p6.page_namespace=14 AND p6.page_is_redirect=0 AND NOT c7.cl_to='Rijksmonumenten_with_known_IDs' AND NOT p7.page_title='Rijksmonumenten_with_known_IDs' AND p7.page_namespace=14 AND p7.page_is_redirect=0 AND (c1.cl_to='Rijksmonumenten' OR c2.cl_to='Rijksmonumenten' OR c3.cl_to='Rijksmonumenten' OR c4.cl_to='Rijksmonumenten' OR c5.cl_to='Rijksmonumenten' OR c6.cl_to='Rijksmonumenten' OR c7.cl_to='Rijksmonumenten') ) /* ORDER BY tracked.page_namespace DESC, tracked.page_title ASC */ LIMIT 500; I used this to create the report at https://commons.wikimedia.org/wiki/Category_talk:Rijksmonumenten#Categories_not_in_the_tree