Last modified: 2014-11-17 09:21:08 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T44259, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 42259 - Make domas' pageviews data available in semi-publicly queryable database format
Make domas' pageviews data available in semi-publicly queryable database format
Status: NEW
Product: Wikimedia
Classification: Unclassified
General/Unknown (Other open bugs)
wmf-deployment
All All
: Normal enhancement with 4 votes (vote)
: ---
Assigned To: Dan Andreescu
http://lists.wikimedia.org/pipermail/...
http://thread.gmane.org/gmane.science...
: analytics
Depends on:
Blocks: 54184
  Show dependency treegraph
 
Reported: 2012-11-19 11:18 UTC by Nemo
Modified: 2014-11-17 09:21 UTC (History)
23 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Nemo 2012-11-19 11:18:42 UTC
This doesn't seem to be tracked yet.
It's been discussed countless times in the past few years: for all sorts of GLAM initiatives and any other initiative to improve content on the projects, we currently rely on Henrik's stats.grok.se data in JSON format, e.g. https://toolserver.org/~emw/index.php?c=wikistats , http://toolserver.org/~magnus/glamorous.php etc.
The data on domas' logs should be available for easy querying on the Toolserver databases and elsewhere, but previous attempts to create such a DB lead nowhere as far as I know.

I suppose this is already one of the highest priorities in the analytics team plans for the new infrastructure, but I wasn't able to confirm it by reading the public documents and it needs to be done anyway sooner or later.

(Not in "Usage statistics" aka "Statistics" component because that's only about raw pageviews data.)
Comment 1 Diederik van Liere 2012-11-19 17:41:26 UTC
This is totally on our roadmap, and the Analytics Team is working on this as part of Kraken.
Comment 2 Emw 2012-11-20 02:23:53 UTC
Diederik, does the Analytics Team plan to make hourly data queryable?  I think being able to see how hourly viewing patterns change over long time periods would be pretty valuable.
Comment 3 Diederik van Liere 2012-11-20 02:27:51 UTC
YES! we totally are planning on doing that.
Comment 4 Ben 2013-02-11 15:05:35 UTC
Henrik's Pageviews tool linked from the History tab on English Wikipedia seems buggy or broken, as mentioned [[User talk:Henrik#What article rank means exactly|here]]. I think that it would be trivial to fix it or replace it, as I mention [[User talk:West.andrew.g/Popular_pages/Archive 1#Possible WMF labs support for your good work|here]]. There already is [http://toolserver.org/~johang/wikitrends/english-most-visited-this-week.html this], but it's only for the top ten, and it's only linked to from English (and Japanese, for the Japanese version) Wikipedias (though it'd take a lot of looking to find it even there). I'd guess that maybe 10% of the articles get 90% of the traffic.  If this is the case, it would be useful to have a list of the top 10% (in the past month or the past year) so as to determine which articles are most popular but badly need improvement (improving much-viewed pages has more effect on the perceived quality of Wikipedia than improving seldom-viewed pages). Such a list, done only once a month -- or even only once a year -- would be extremely useful.
Comment 5 Ben 2013-02-12 01:59:28 UTC
Some WikiProjects are compiling popularity data and using it to improve popular articles, see [[Wikipedia:WikiProject Medicine/Popular_pages]]. But popularity data really needs to be readily available to other projects and other (foreign-) language Wikipedias. Already one person has done a [http://toolserver.org/~johang/2012.html top 100 for 2012] (including other-language Wikipedias) but ideally this would be extended to the top 5000 or top 10% -- and also linked to from the other foreign-language Wikipedias, as few people seem to know about it.
Hourly data is surely only of commercial interest -- it would help people know which hours and days are best for paid advertising in search engines. [[User:LittleBenW]]
Comment 6 MZMcBride 2013-02-12 02:15:43 UTC
(In reply to comment #3)
> YES! we totally are planning on doing that.

Is there a status update (or page on mediawiki.org) tracking this feature request?
Comment 7 Diederik van Liere 2013-03-21 15:10:14 UTC
See https://mingle.corp.wikimedia.org/projects/analytics/cards/113 for progress. Would love your input regarding In Scope / Out of Scope and User stories. Just add them to this thread in Bugzilla and I will add them to the mingle card.
Comment 8 Tim Landscheidt 2013-05-28 20:03:52 UTC
(In reply to comment #7)
> See https://mingle.corp.wikimedia.org/projects/analytics/cards/113 for
> progress. Would love your input regarding In Scope / Out of Scope and User
> stories. Just add them to this thread in Bugzilla and I will add them to the
> mingle card.

In http://permalink.gmane.org/gmane.science.linguistics.wikipedia.technical/67248 you planned a sprint at the Amsterdam hackathon.  Was it successful?
Comment 9 Diederik van Liere 2013-06-03 10:55:55 UTC
I've got a first draft of the puppet manifest, it needs some more work. 
@Nemo: I don't have access to the private conversations on the cultural wikimedia mailinglists. Can we have these discussions on wikimedia-analytics mailinglist?
Comment 10 Nemo 2013-06-03 11:11:49 UTC
(In reply to comment #9)
> @Nemo: I don't have access to the private conversations on the cultural
> wikimedia mailinglists. Can we have these discussions on wikimedia-analytics
> mailinglist?

You could ask access to see the archives. Most of those discussions happened before the analytics list existed, anyway I don't control where the topic is raised: this is a widespread need so it pops up everywhere, I just collect links.
Comment 11 James Heilman 2013-09-19 02:19:08 UTC
We at Wikiproject Medicine are really interested to know how many medical articles there are and which are top viewed in other languages. https://meta.wikimedia.org/wiki/WikiProject_Med/Tech#Metrics_requests

Look forwards to seeing a tool that can do this. Please let me know if there is anything I can do to help. The other thing that may be needed is a bot to automatically tag articles in other languages either by "Wikiproject" or as categories.
Comment 12 Nemo 2013-10-21 14:50:48 UTC
For last updates see the last http://lists.wikimedia.org/pipermail/analytics/2013-October/thread.html#1062 "[Analytics] Back of the envelope data size" where requirements for this bug were discussed a bit.
Comment 13 Nemo 2013-11-14 10:35:45 UTC
According to corridor rumors :), Dario at Wikimania said that some sort of pageview data is going, at some point, to be integrated into [[mw:Wikimetrics]]. If true, where is this tracked/documented and is it a parallel effort or something depending on this bug?
Comment 14 Nemo 2014-01-09 16:30:52 UTC
Given the silence since October, I checked the project pages a bit. I can't find any real mention of pageviews under [[wikitech:Analytics]] and under [[mw:Analytics]] there are no actual mentions of them related to Kraken other than "examples of the sort of thing Kraken might store" at [[mw:Analytics/Kraken/Researcher analysis]].

As such, I believe that in the current state of things the move of this bug under "Analytics" and specifically "Kraken" was premature cookie-licking and I'm moving it to the generic component for this sort of issues so that it can be picked up by whatever person or project wishes so. The "analytics" keyword stays.
Comment 15 phoebe 2014-01-13 20:48:06 UTC
Just a +1 that the stats that you can get from http://stats.grok.se (pageviews for a particular article, presented in a pretty little graph) are VERY helpful for all of us who do outreach work, presentations, education, work with GLAMs, etc -- not to mention for simple curiousity :) I've love to see a tool that could do this for all languages.
Comment 16 Emw 2014-04-05 15:43:23 UTC
Any updates on this?  

My Wikipedia traffic visualization tool (https://toolserver.org/~emw/wikistats/) was among those listed as motivation for this ticket, but I recently decommissioned it.  I see that a pageview API is on the Analytics team's Q2 2014 priorities list, but it's last: https://www.mediawiki.org/w/index.php?title=Analytics/Prioritization_Planning&oldid=850355.  

My reasons for ceasing development and eventually maintenance of that tool are mostly unrelated to the lack of progress on this issue, but it was a notable factor.  For now I'm pointing folks to http://tools.wmflabs.org/wikiviewstats/.  However, I'm not aware of any tool other than mine that enables users to, in a single graph, visualize daily page views for up to 5 years, compare such data for multiple articles in a language / one article in multiple languages, or to view that data in table format and download it as a CSV.
Comment 17 mrjohncummings 2014-04-05 19:00:08 UTC
I'd like to mention Magnus Manske's blog post about this, it includes a reply by Toby Negrin, Head of Analytics at the Wikimedia Foundation http://magnusmanske.de/wordpress/?p=173
Comment 18 Nemo 2014-04-12 12:31:01 UTC
(In reply to mrjohncummings from comment #17)
> I'd like to mention Magnus Manske's blog post about this, it includes a
> reply by Toby Negrin, Head of Analytics at the Wikimedia Foundation
> http://magnusmanske.de/wordpress/?p=173

I also asked a question at https://meta.wikimedia.org/wiki/Grants_talk:APG/Proposals/2013-2014_round2/Wikimedia_Foundation/Proposal_form#Multiplication_of_tools

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links