Last modified: 2014-01-03 15:22:38 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T61260, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 59260 - DBQ-7 Shortpages without templates and a single contributor
DBQ-7 Shortpages without templates and a single contributor
Status: RESOLVED FIXED
Product: Tool Labs tools
Classification: Unclassified
Database Queries (Other open bugs)
unspecified
All All
: Unprioritized minor
: ---
Assigned To: Bugzilla Bug Importer (valhallasw)
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-01-03 15:22 UTC by Bugzilla Bug Importer (valhallasw)
Modified: 2014-01-03 15:22 UTC (History)
0 users

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Bugzilla Bug Importer (valhallasw) 2014-01-03 15:22:28 UTC
This issue was converted from https://jira.toolserver.org/browse/DBQ-7.
Summary: Shortpages without templates and a single contributor
Issue type: Task - A task that needs to be done.
Priority: Minor
Status: Done
Assignee: Andreas Gasser <andreas_gasser@yahoo.com>

-------------------------------------------------------------------------------
From: MZMcBride <mzmcbride@gmail.com>
Date: Sun, 06 Jan 2008 01:22:25
-------------------------------------------------------------------------------

I'd like a query run that can get a list of all shortpages on en.wiki that don't contain templates and only have a single contributor. Erwin provided me with the query he uses (below), however I've been told that it can probably be optimized or written in a more efficient manner. Thanks!

/usr/bin/mysql -hsql-s1 --skip-column-names enwiki_p > /home/erwin85/public_html/dbq/shortpages_enwiki.txt <<EOF  
SELECT CONCAT(ns_name, ':', page_title, ' - ', page_len)  
FROM page  
LEFT JOIN toolserver.namespace  
ON page_namespace = ns_id  
WHERE page_len < 50  
AND page_is_redirect = 0  
AND page_namespace NOT IN (2, 3)  
AND (SELECT COUNT(1) FROM templatelinks WHERE tl_from = page_id) = 0  
AND (SELECT COUNT(DISTINCT rev_user_text) FROM revision WHERE rev_page = page_id) = 1  
AND dbname = 'enwiki_p'  
ORDER BY page_len ASC  
EOF
Comment 1 Bugzilla Bug Importer (valhallasw) 2014-01-03 15:22:30 UTC
-------------------------------------------------------------------------------
From: DaB. <dab@ts.wikimedia.org>
Date: Sun, 06 Jan 2008 01:52:10
-------------------------------------------------------------------------------

SELECT page_namespace,page_title,page_len   
FROM revision,page   
LEFT JOIN templatelinks   
ON tl_from = page_id   
WHERE page_len<50   
AND page_is_redirect = 0   
AND page_namespace NOT IN (2, 3)   
AND rev_page=page_id   
AND tl_from IS NULL   
GROUP BY rev_user  
HAVING count(DISTINCT rev_user)<2  
ORDER BY page_len   
LIMIT 20;

seems to do what you want (the join with toolserver.namespace is missing, that's trivial ![][1]). But perhaps Duesentrieb knows a better way ![][2]

Sincerly,  
DaB.

P.S: on dewp the query needs 8 min 5.23 sec and 0.31 sec on cswp.

   [1]: https://jira.toolserver.org/images/icons/emoticons/wink.gif
   [2]: https://jira.toolserver.org/images/icons/emoticons/smile.gif
Comment 2 Bugzilla Bug Importer (valhallasw) 2014-01-03 15:22:32 UTC
-------------------------------------------------------------------------------
From: Daniel Kinzler <daniel@brightbyte.de>
Date: Sun, 06 Jan 2008 14:16:01
-------------------------------------------------------------------------------

The result is at <http://tools.wikimedia.de/~daniel/misc/shortpages-enwiki.txt> (39441), details at <http://wiki.ts.wikimedia.org/view/Query_service/short_pages_enwiki>.

I used the original query, only cleaned it up a bit. I got rid of the first subquery, but as it turns out, mysql was already smart enough to optimize it away anyway.

@DaB: avoiding the second subquery at the cost of running a GROUP on all revisions on enwiki probably makes things worse, not better. Running group on such a huge table is expensive, and running a simple subquery on a set of relatively few pages is probably faster - this is true because page_len < 50 is a pretty strong limitation; if we had no such strong resatriction, grouping would probably be faster.
Comment 3 Bugzilla Bug Importer (valhallasw) 2014-01-03 15:22:33 UTC
-------------------------------------------------------------------------------
From: DaB. <dab@ts.wikimedia.org>
Date: Sun, 06 Jan 2008 23:28:51
-------------------------------------------------------------------------------

mm, your query is a little bit slower then mine. Your query needs 10 min 26.73 sec on dewp and 1 min 24.52 sec on cswp. Any idea, why?

(That's only a question to learn ![][1])

   [1]: https://jira.toolserver.org/images/icons/emoticons/smile.gif
Comment 4 Bugzilla Bug Importer (valhallasw) 2014-01-03 15:22:35 UTC
-------------------------------------------------------------------------------
From: Daniel Kinzler <daniel@brightbyte.de>
Date: Mon, 07 Jan 2008 00:14:00
-------------------------------------------------------------------------------

when i tested, it was a bit faster ![][1] the problem is: caches. to get a reliable benchmark, you would have to try each on a cold cache. if you run either qiery twice, the second go is **much** faster. this may also make the other query faster. also, some query run by someone else just before you tried may have loaded the relevant index into the cache, etc...

   [1]: https://jira.toolserver.org/images/icons/emoticons/smile.gif
Comment 5 Bugzilla Bug Importer (valhallasw) 2014-01-03 15:22:36 UTC
-------------------------------------------------------------------------------
From: Andreas Gasser <andreas_gasser@yahoo.com>
Date: Tue, 28 Jun 2011 10:09:48
-------------------------------------------------------------------------------

http://toolserver.org/
Comment 6 Bugzilla Bug Importer (valhallasw) 2014-01-03 15:22:38 UTC
This bug was imported as RESOLVED. The original assignee has therefore not been
set, and the original reporters/responders have not been added as CC, to
prevent bugspam.

If you re-open this bug, please consider adding these people to the CC list:
Original assignee: (none)
CC list: b@mzmcbride.com, wikimedia-bugzilla@dabpunkt.eu, daniel.kinzler@wikimedia.de

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links