Last modified: 2014-01-03 16:01:57 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T61394, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 59394 - DBQ-137 statistics of different languages
DBQ-137 statistics of different languages
Status: RESOLVED FIXED
Product: Tool Labs tools
Classification: Unclassified
Database Queries (Other open bugs)
unspecified
All All
: Unprioritized major
: ---
Assigned To: Bugzilla Bug Importer (valhallasw)
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-01-03 16:01 UTC by Bugzilla Bug Importer (valhallasw)
Modified: 2014-01-03 16:01 UTC (History)
0 users

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Bugzilla Bug Importer (valhallasw) 2014-01-03 16:01:49 UTC
This issue was converted from https://jira.toolserver.org/browse/DBQ-137.
Summary: statistics of different languages
Issue type: Task - A task that needs to be done.
Priority: Major
Status: Done
Assignee: Hoo man <hoo@online.de>

-------------------------------------------------------------------------------
From: Minn Seok Choi <MinnSeok.Choi@gmail.com>
Date: Wed, 20 Apr 2011 19:21:49
-------------------------------------------------------------------------------

I am not sure it is possible to retrieve some data from the Wikipedia databases. If it is possible, I would like to get the following variables from the different Wikipedias shown in the list:

A. total pages of each namespace pages (excluding redirects)  
(1) the number of article pages (i.e. main namespace pages)  
(2) the number of talk pages  
(3) the number of user pages  
(4) the number of user talk pages  
(5) the number of Wikipedia pages  
(6) the number of Wikipedia talk pages  
(7) the number of file pages  
(8) the number of file talk pages  
(9) the number of template pages  
(10) the number of template talk pages  
(11) the number of portal pages  
(12) the number of portal talk pages  
(13) the number of help pages  
(14) the number of help talk pages

B. total edits to each namespace (excluding redirects)  
(15) the number of article pages (i.e. main namespace pages)  
(16) the number of talk pages  
(17) the number of user pages  
(18) the number of user talk pages  
(19) the number of Wikipedia pages  
(20) the number of Wikipedia talk pages  
(21) the number of file pages  
(22) the number of file talk pages  
(23) the number of template pages  
(24) the number of template talk pages  
(25) the number of portal pages  
(26) the number of portal talk pages  
(27) the number of help pages  
(28) the number of help talk pages 

C. size of each namespace (byte)(excluding redirects)  
(29) the size of article pages (i.e. main namespace pages)  
(30) the number of talk pages  
(31) the number of user pages  
(32) the number of user talk pages  
(33) the number of Wikipedia pages  
(34) the number of Wikipedia talk pages  
(35) the number of file pages  
(36) the number of file talk pages  
(37) the number of template pages  
(38) the number of template talk pages  
(39) the number of portal pages  
(40) the number of portal talk pages  
(41) the number of help pages  
(42) the number of help talk pages 

D. URL for certain pages  
(43) the URL of community portal pages (if available)  
(44) the URL of village pump, it available)  
(45) the URL of help desk  
(46) the URL of Featured article portal

== the Wikipedia list (68 languages) ==

en English  
de German  
fr French  
pl Polish  
it Italian  
ja Japanese  
es Spanish  
ru Russian  
pt Portuguese  
nl Dutch  
sv Swedish  
zh Chinese  
ca Catalan  
no Norwegian (Bokmål)  
uk Ukrainian  
fi Finnish  
vi Vietnamese  
cs Czech  
hu Hungarian  
ko Korean  
ro Romanian  
id Indonesian  
tr Turkish  
da Danish  
ar Arabic  
eo Esperanto  
sr Serbian  
lt Lithuanian  
sk Slovak  
he Hebrew  
ms Malay  
bg Bulgarian  
sl Slovenian  
hr Croatian  
et Estonian  
simple Simple English  
th Thai  
eu Basque  
nn Norwegian (Nynorsk)  
el Greek  
az Azerbaijan  
la Latin  
tl Tagalog  
te Telugu  
ka Georgian  
sh Serbo-Croatian  
be-x-old Belarusian (Taraškievica)  
lv Latvian  
jv Javanese  
sq Albanian  
bs Bosnian  
is Icelandic  
ta Tamil  
an Aragonese  
oc Occitan  
bn Bengali  
ml Malayalam  
af Afrikaans  
ur Urdu  
zh-yue Cantonese  
ast Asturian  
yo Yuruba  
wa Walloon  
yi Yiddish  
uz Uzbek  
li Limburgian  
ia Interlingua  
szl Silesian
Comment 1 Bugzilla Bug Importer (valhallasw) 2014-01-03 16:01:51 UTC
-------------------------------------------------------------------------------
From: Hoo man <hoo@online.de>
Date: Fri, 22 Apr 2011 18:39:33
-------------------------------------------------------------------------------

The following is feasible: 1-14 and (may) 29 - 42.  
Please confirm that the above data alone is useful for you and please give me the lang code (like en for English, sq for Albanian) for the above languages (I'm to lazy to get them myself ![][1] ).

   [1]: https://jira.toolserver.org/images/icons/emoticons/tongue.gif
Comment 2 Bugzilla Bug Importer (valhallasw) 2014-01-03 16:01:52 UTC
-------------------------------------------------------------------------------
From: Minn Seok Choi <MinnSeok.Choi@gmail.com>
Date: Sat, 23 Apr 2011 08:54:04
-------------------------------------------------------------------------------

Thanks, Hoo man. 1-14 and 29-42 are useful for me. I updated my query request by adding the language codes, following your comment.
Comment 3 Bugzilla Bug Importer (valhallasw) 2014-01-03 16:01:54 UTC
-------------------------------------------------------------------------------
From: Hoo man <hoo@online.de>
Date: Sun, 24 Apr 2011 18:08:20
-------------------------------------------------------------------------------

Ok, fine, thanks for the language codes ![][1]  
Code (did id in PHP because I once again was to lazy for bash ![][2]):
    
    #!/bin/php
    <?php
    $langcodes = array('en', 'de', 'fr', 'pl', 'it', 'ja', 'es', 'ru', 'pt', 'nl', 'sv', 'zh', 'ca', 'no', 'uk', 'fi', 'vi', 'cs', 'hu', 'ko', 'ro', 'id', 'tr', 'da', 'ar', 'eo', 'sr', 'lt', 'sk', 'he', 'ms', 'bg', 'sl', 'hr', 'et', 'simple', 'th', 'eu', 'nn', 'el', 'az', 'la', 'tl', 'te', 'ka', 'sh', 'be_x_old', 'lv', 'jv', 'sq', 'bs', 'is', 'ta', 'an', 'oc', 'bn', 'ml', 'af', 'ur', 'zh_yue', 'ast', 'yo', 'wa', 'yi', 'uz', 'li', 'ia', 'szl');
    $file = '../public_html/dbq/dbq-137.txt';
    foreach($langcodes as $lang) {
    	$query = 'SELECT /* SLOW_OK */ \'' . $lang . '\' as lang, page_namespace, COUNT(*) as page_count, SUM(page_len) as namespace_size FROM page WHERE page_namespace IN(0,1,2,3,4,5,6,7,10,11,100,101,12,13) AND page_is_redirect = 0 GROUP BY page_namespace;';
    	echo 'Executing "' . $query .'" on ' . $lang . "wiki_p\n";
    	exec('mysql --host=' . $lang . 'wiki-p.rrdb.toolserver.org --database=' . $lang . 'wiki_p -e"' . $query . '" | cat >> ' . $file);
    }
    ?>
    

Result:  
http://toolserver.org/~hoo/dbq/dbq-137.txt (plain text)  
http://toolserver.org/~hoo/dbq/dbq-137.csv (Excel readable csv)

   [1]: https://jira.toolserver.org/images/icons/emoticons/smile.gif
   [2]: https://jira.toolserver.org/images/icons/emoticons/tongue.gif
Comment 4 Bugzilla Bug Importer (valhallasw) 2014-01-03 16:01:55 UTC
-------------------------------------------------------------------------------
From: Minn Seok Choi <MinnSeok.Choi@gmail.com>
Date: Mon, 25 Apr 2011 19:53:25
-------------------------------------------------------------------------------

Thank you so much, Hoo man.
Comment 5 Bugzilla Bug Importer (valhallasw) 2014-01-03 16:01:57 UTC
This bug was imported as RESOLVED. The original assignee has therefore not been
set, and the original reporters/responders have not been added as CC, to
prevent bugspam.

If you re-open this bug, please consider adding these people to the CC list:
Original assignee: hoo@online.de
CC list: hoo@online.de

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links