Last modified: 2014-01-03 15:50:00 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T61341, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 59341 - DBQ-89 Finding unlinked articles / finding empty but linked articles
DBQ-89 Finding unlinked articles / finding empty but linked articles
Status: RESOLVED FIXED
Product: Tool Labs tools
Classification: Unclassified
Database Queries (Other open bugs)
unspecified
All All
: Unprioritized minor
: ---
Assigned To: Bugzilla Bug Importer (valhallasw)
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-01-03 15:49 UTC by Bugzilla Bug Importer (valhallasw)
Modified: 2014-01-03 15:50 UTC (History)
0 users

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Bugzilla Bug Importer (valhallasw) 2014-01-03 15:49:53 UTC
This issue was converted from https://jira.toolserver.org/browse/DBQ-89.
Summary: Finding unlinked articles / finding empty but linked articles
Issue type: Task - A task that needs to be done.
Priority: Minor
Status: Done
Assignee: merl <mewikipedia@to.mabomuja.de>

-------------------------------------------------------------------------------
From: Läuterungsberg  <Ligl@schulevonathen.de>
Date: Tue, 16 Mar 2010 21:33:48
-------------------------------------------------------------------------------

a) Create a list of articles which do exist, but aren´t linked to.  
b) Create a list of articles which do not have any content but are linked to (via a redlink), sorted by the number of redlinks to the article.

Both is meant for the english and the german wikipedia.
Comment 1 Bugzilla Bug Importer (valhallasw) 2014-01-03 15:49:55 UTC
-------------------------------------------------------------------------------
From: merl <mewikipedia@to.mabomuja.de>
Date: Wed, 17 Mar 2010 15:41:59
-------------------------------------------------------------------------------

Some existing tools:  
a)

  * for dewiki  
http://de.wikipedia.org/wiki/Special:LonelyPages  
http://de.wikipedia.org/wiki/Wikipedia:WikiProjekt_Verwaiste_Seiten/Arbeitsseite  
http://de.wikipedia.org/wiki/Wikipedia:WikiProjekt_Verwaiste_Seiten/Kategorien  
http://toolserver.org/~merl/specialpages/dewiki/LonelyPages
  * for enwiki  
http://de.wikipedia.org/wiki/Special:LonelyPages

b) http://toolserver.org/~magnus/missingtopics.php  
http://de.wikipedia.org/wiki/Benutzer:P.Copp/Spezialseiten/Gew%C3%BCnschte_Artikel (an old version)

* * *

Which output format? (simply list/CSV, wiki-syntax, ...)  
What do you mean with "(not) linked to"? I think you only want links included from article namespace (ignoring links from other namespaces)?
Comment 2 Bugzilla Bug Importer (valhallasw) 2014-01-03 15:49:56 UTC
-------------------------------------------------------------------------------
From: merl <mewikipedia@to.mabomuja.de>
Date: Fri, 19 Mar 2010 23:31:19
-------------------------------------------------------------------------------

No response, so i'll run the query.  
a) All articles that are not linked, embedded or redirected. I removed the disambiguationspages. But for dewiki you should use my tool, because my bot daily add these pages to the project page and those aren't lonely anymore.  
b) All links from article namespace to not existing articles  
Here my SGE script
    
    #! /bin/bash
    #$ -N DBQ-89
    #$ -hard
    #$ -l sqlprocs-s5=1
    WIKI="dewiki"
    SQL="
    CREATE TEMPORARY TABLE temp (t_pid INT PRIMARY KEY, t_pt VARCHAR(255));
    INSERT INTO temp
     SELECT page_id, page_title
      FROM ${WIKI}_p.page p
       LEFT JOIN ${WIKI}_p.pagelinks ON p.page_namespace = pl_namespace AND p.page_title=pl_title
       LEFT JOIN ${WIKI}_p.templatelinks ON p.page_namespace = tl_namespace AND p.page_title=tl_title
       LEFT JOIN ${WIKI}_p.redirect ON p.page_namespace = rd_namespace AND p.page_title=rd_title
      WHERE p.page_namespace=0 AND page_is_redirect=0 AND pl_namespace IS NULL AND tl_namespace IS NULL AND rd_namespace IS NULL;
    CREATE TEMPORARY TABLE tempdab(dab_tltitle INT PRIMARY KEY);
    INSERT INTO tempdab
     SELECT pl_title
      FROM ${WIKI}_p.page
       INNER JOIN ${WIKI}_p.pagelinks ON page_id=pl_from
      WHERE page_namespace=8 AND page_title='Disambiguationspage' AND pl_namespace=10;
    CREATE TEMPORARY TABLE tempdel(td_pid INT PRIMARY KEY);
    INSERT IGNORE INTO tempdel
     SELECT t_pid
      FROM temp
       INNER JOIN ${WIKI}_p.templatelinks ON t_pid=tl_from
       INNER JOIN tempdab ON tl_title = dab_tltitle
      WHERE tl_namespace=10;
    DELETE a
     FROM temp a
      INNER JOIN tempdel b ON t_pid=td_pid;
    SELECT REPLACE(t_pt,'_',' ')
     FROM temp
     ORDER BY t_pt;"
    mysql -wBN -h$WIKI-p.db u_$USER -e "$SQL" > ${HOME}/public_html/sql/DBQ-89-${WIKI}-LonelyPages.txt
    SQL="
    CREATE TEMPORARY TABLE temp (pid INT, pt VARCHAR(255), plt VARCHAR(255), INDEX(pt), INDEX(plt));
    INSERT INTO temp
     SELECT page_id, page_title, pl_title
     FROM ${WIKI}_p.page
      INNER JOIN ${WIKI}_p.pagelinks ON pl_from = page_id
     WHERE page_namespace = 0 AND page_is_redirect = 0 AND pl_namespace = 0;
    SELECT CONCAT(REPLACE(plt,'_',' '),',',COUNT(*))
     FROM temp
      LEFT JOIN ${WIKI}_p.page ON page_namespace = 0 AND plt = page_title
     WHERE page_namespace IS NULL
     GROUP BY plt
     HAVING COUNT(*) > 300
     ORDER BY COUNT(*) DESC
     LIMIT 5000;"
    mysql -wBN -h$WIKI-p.db u_$USER -e "$SQL"  > ${HOME}/public_html/sql/DBQ-89-${WIKI}-Redlinks.txt
Comment 3 Bugzilla Bug Importer (valhallasw) 2014-01-03 15:49:58 UTC
-------------------------------------------------------------------------------
From: merl <mewikipedia@to.mabomuja.de>
Date: Tue, 30 Mar 2010 22:35:06
-------------------------------------------------------------------------------

  * Redlinks 
    * http://toolserver.org/~merl/sql/DBQ-89-enwiki-Redlinks.txt
    * http://toolserver.org/~merl/sql/DBQ-89-dewiki-Redlinks.txt
  * Because bots are linking loneypages you should use the project pages. Query was not successful 
    * http://en.wikipedia.org/wiki/Category:All_orphaned_articles
    * http://de.wikipedia.org/wiki/Wikipedia:WikiProjekt_Verwaiste_Seiten/Kategorien
    * http://toolserver.org/~merl/specialpages/dewiki/LonelyPages
Comment 4 Bugzilla Bug Importer (valhallasw) 2014-01-03 15:50:00 UTC
This bug was imported as RESOLVED. The original assignee has therefore not been
set, and the original reporters/responders have not been added as CC, to
prevent bugspam.

If you re-open this bug, please consider adding these people to the CC list:
Original assignee: bugreporter@to.mabomuja.de
CC list: bugreporter@to.mabomuja.de

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links