Last modified: 2014-11-17 09:45:00 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T32848, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 30848 - Data Request: Aggregated-to-country-code traffic data for different language versions (no IP addresses needed)
Data Request: Aggregated-to-country-code traffic data for different language ...
Status: NEW
Product: Datasets
Classification: Unclassified
Webstatscollector (Other open bugs)
unspecified
All All
: Normal normal (vote)
: ---
Assigned To: Nobody - You can work on this!
: analytics
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-09-10 14:13 UTC by hanteng
Modified: 2014-11-17 09:45 UTC (History)
3 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description hanteng 2011-09-10 14:13:43 UTC
As advised by Erik Zachte, two researchers from the Oxford Internet Institute request the aggregated-to-country-code (and also finer aggregated-to-longitude/latitude-point, if possible) traffic data for all available language versions.  The data will be used for improved mapping for Wikimedia Traffic Analysis Report, which shall benefit the public understanding of Wikimedia's multilingual development.  The resulted maps will be released in copyleft license. 

* Existing tables provided by Erik Zachte's Wikimedia Traffic Analysis Report
Very interesting data presentation to show how different languages are accessed across different regions (based on the geoIP categorization).  
http://stats.wikimedia.org/wikimedia/squids/SquidReportPageViewsPerLanguageBreakdown.htm

* Proposed enhancement by our pilot mapping
To show how each language version is accessed across the world on the geographic map, which needs more detailed data, especially for those languages with sizable traffic from the "other" categories in the table shown at the link above. 

* Two researchers
Dr. Mark Graham does work on analysing patterns in Wikipedia (e.g. http://www.oii.ox.ac.uk/research/projects/?id=66). Mr. Han-Teng Liao has been using the MaxMind geoIP database for mapping the proportional difference between Baidu Baike and Chinese Wikipedia's external/citation links here: http://people.oii.ox.ac.uk/hanteng/2011/09/04/difference-in-proportional-emphasis-baidu-baike-and-chinese-wikipedia-comparison/.) 

* Expected outcome
Published maps released in copy-left license to be stored at Wikicommons.  Potential academic articles and blogs on the language phenomenon in multilingual 
 Wikipedia project. 

* Researchers' sensitivity to privacy concerns and capacity in modern cartography.  
Both researchers realise the sensitivity of IP data, and in no way want to violate user's expectations of privacy (Han-Teng is especially sensitive to this issue have been involved with Human Rights groups for the Internet industry in DC). This is why we don't want to see any IPs, but would very much like to work with aggregated data to the level of country codes at the very least, or the data aggregated to the level of city or even the longitude and latitude points (Dr Mark Graham is an trained geographer with expertise in mapping both offline and online data, as shown in http://www.floatingsheep.org/ ).
Comment 1 Diederik van Liere 2011-12-06 03:10:16 UTC
Hi Hanteng,
Do you need  help getting some more traction on this?
Comment 2 Andre Klapper 2012-12-03 13:59:17 UTC
[mass-moving wikistats reports from Wikimedia→Statistics to Analytics→Wikistats to have stats issues under one Bugzilla product (see bug 42088) - sorry for the bugspam!]

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links