Last modified: 2014-01-03 15:55:24 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T61356, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 59356 - DBQ-104 List of mainspace articles less than 0.5kb in size in Tamil Wikipedia with date of creation
DBQ-104 List of mainspace articles less than 0.5kb in size in Tamil Wikipedia...
Status: RESOLVED FIXED
Product: Tool Labs tools
Classification: Unclassified
Database Queries (Other open bugs)
unspecified
All All
: Unprioritized minor
: ---
Assigned To: Bugzilla Bug Importer (valhallasw)
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-01-03 15:55 UTC by Bugzilla Bug Importer (valhallasw)
Modified: 2014-01-03 15:55 UTC (History)
0 users

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Bugzilla Bug Importer (valhallasw) 2014-01-03 15:55:13 UTC
This issue was converted from https://jira.toolserver.org/browse/DBQ-104.
Summary: List of mainspace articles less than 0.5kb in size in Tamil Wikipedia with date of creation
Issue type: Task - A task that needs to be done.
Priority: Minor
Status: Done
Assignee: EdoDodo <dodo.wikipedia@gmail.com>

-------------------------------------------------------------------------------
From: Sodabottle  <sodabottle@gmail.com>
Date: Fri, 24 Sep 2010 22:14:20
-------------------------------------------------------------------------------

I am trying to create a list of Tamil Wikipedia articles that are less than 0.5kb in size. This is for a new wikiproject to develop stubs in ta.wiki.

The list is for mainspace articles only and redirects, templates and categories shouldn't be included
Comment 1 Bugzilla Bug Importer (valhallasw) 2014-01-03 15:55:15 UTC
-------------------------------------------------------------------------------
From: EdoDodo <dodo.wikipedia@gmail.com>
Date: Sat, 25 Sep 2010 08:57:45
-------------------------------------------------------------------------------

I ran the following query:

SELECT page_id, page_len FROM page  
WHERE page_len < 512 AND page_namespace = 0 AND page_is_redirect != 1  
ORDER BY page_len

Then, I did some post-processing using the API and the page IDs to get the page titles properly encoded (the database did not return them encoded properly). I then combined the results of the API queries and that of the database query and exported them to a CSV file (UTF-8 encoded), which is attached. After that, I ran a bunch of regex find-and-replaces manually to get the data from the CSV into a wikitable, which is also attached. Feel free to use whichever format is more convenient for you.

Only 150 pages were found, but that should give you a bit to work on ![][1]. Unfortunately, the size does not consider transcluded templates so some pages are in fact not really that short (for example, your main page is second on the list, because it is made up of just three transcluded templates). The results are ordered by length, the shortest ones are first.

Anyway, good luck and let me know if there's anything else I can do for you.

   [1]: https://jira.toolserver.org/images/icons/emoticons/smile.gif
Comment 2 Bugzilla Bug Importer (valhallasw) 2014-01-03 15:55:17 UTC
-------------------------------------------------------------------------------
From: Sodabottle  <sodabottle@gmail.com>
Date: Sat, 25 Sep 2010 09:13:57
-------------------------------------------------------------------------------

Thank you EdoDodo,

This is exactly what i was looking for. Now we can get to work on these. I actually surprised at the lesser number of articles - the wikimedia stats page (http://stats.wikimedia.org/EN/TablesWikipediaTA.htm) says that 18% of our articles are less than 0.5k in size (though it is for May 2010, i am sure we have not done anything to expand the stubs since then). So by our article count, there should be about 5000 articles that are < 0.5k. Any idea why this discrepancy happens?

regards  
Sodabottle
Comment 3 Bugzilla Bug Importer (valhallasw) 2014-01-03 15:55:19 UTC
-------------------------------------------------------------------------------
From: EdoDodo <dodo.wikipedia@gmail.com>
Date: Sat, 25 Sep 2010 09:33:20
-------------------------------------------------------------------------------

Hi,

Hmm... It is strange that there is such a large discrepancy between the two, it's only been 4 months after all, and the overall size of the wiki that was listed hasn't increased an awful lot (although, if a significant part of the size increase was focused on stubs, that would explain it). I'll ask someone to check my query later today and see if I've made any mistakes but it looks all right to me.

EdoDodo
Comment 4 Bugzilla Bug Importer (valhallasw) 2014-01-03 15:55:21 UTC
-------------------------------------------------------------------------------
From: Guandalug <A.Meiske@nightstone.de>
Date: Sat, 25 Sep 2010 16:57:53
-------------------------------------------------------------------------------

The query is not to be blamed in any case. 

I just run a count( * ) on it, and found 145 - seems somebody is very busy getting rid of short articles.

If the limit is set to 600 byte, it's 216 articles. With 1024 Byte (1kB) it'd be a liast of 1095 articles.
Comment 5 Bugzilla Bug Importer (valhallasw) 2014-01-03 15:55:23 UTC
-------------------------------------------------------------------------------
From: Sodabottle  <sodabottle@gmail.com>
Date: Sat, 25 Sep 2010 17:05:34
-------------------------------------------------------------------------------

Thanks for the confirmation. I started working on the list, Thats why the numbers are decreasing
Comment 6 Bugzilla Bug Importer (valhallasw) 2014-01-03 15:55:24 UTC
This bug was imported as RESOLVED. The original assignee has therefore not been
set, and the original reporters/responders have not been added as CC, to
prevent bugspam.

If you re-open this bug, please consider adding these people to the CC list:
Original assignee: dodo.wikipedia@gmail.com
CC list: dodo.wikipedia@gmail.com, guandalug@nurfuerspam.de, sodabottle@gmail.com

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links