Last modified: 2014-03-26 11:15:21 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T64874, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 62874 - stats for Wikidata exports
stats for Wikidata exports
Status: NEW
Product: Analytics
Classification: Unclassified
Wikistats (Other open bugs)
unspecified
All All
: High enhancement
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-03-20 16:55 UTC by Lydia Pintscher
Modified: 2014-03-26 11:15 UTC (History)
5 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Lydia Pintscher 2014-03-20 16:55:30 UTC
Can you please make access statistics for the exports at https://www.wikidata.org/entity/Q1.json and similar available? The available formats are json, rdf, n3, xml and ttl.
Statistics split by format would be most useful.
Comment 1 Toby Negrin 2014-03-20 23:36:42 UTC
Erik Z and I will discuss and prioritize.

-Toby
Comment 2 Erik Zachte 2014-03-24 16:51:13 UTC
I reckon prio 'high' is for assessment of requirements and doability. So without further ado some questions here: 

Lydia, can you please explain in more detail what this is about? The link above just points to some structured data file, without any further explanation. Is this a data dump for one article? Just guessing.

Statistics split by format. Do I understand correctly you want as many monthly totals as there are formats, no further granularity (I hope so).

Where to find those numbers? Is there a table or api log which stores api requests, that you know of? Or should we be look at general traffic logs? We have 1:1000 sampled squid log reports (that would only work if api requests come by 100,000's per month, also those are more or less frozen in a partially functional state, as new infrastructure for traffic analysis is still expected to happen soonish).

Thanks for follow-up.
Comment 3 Bingle 2014-03-24 16:52:50 UTC
Prioritization and scheduling of this bug is tracked on Mingle card https://wikimedia.mingle.thoughtworks.com/projects/analytics/cards/cards/1491
Comment 4 Daniel Kinzler 2014-03-26 11:15:21 UTC
(In reply to Erik Zachte from comment #2)
> Lydia, can you please explain in more detail what this is about? The link
> above just points to some structured data file, without any further
> explanation. Is this a data dump for one article? Just guessing.

Yes. https://www.wikidata.org/entity/Q1.json is part of wikidata's linked data interface. Basically, https://www.wikidata.org/entity/<id>[.<format>] URLs allow access to the machine readable description of an entity in the given format. If the format is not given, content negotiation is applied.

In the end, these URLs are resolved to a redirect (303 and/or 302) to wiki/Special:EntityData with the appropriate parameters. E.g. the example above results in a redirect to https://www.wikidata.org/wiki/Special:EntityData/Q1.json

I suppose that is already counted, but only as a single count, not for each entity/format.
 
> Statistics split by format. Do I understand correctly you want as many
> monthly totals as there are formats, no further granularity (I hope so).

From Lydia's original description, I gather that we are not interested in per-entity counters, but only per format. Considering that the format is not always given explicitly in the original URL, it woud probably be easiest to look at requests for wiki/Special:EntityData/*.<format> and base the statistics on that.

> Where to find those numbers? Is there a table or api log which stores api
> requests, that you know of? Or should we be look at general traffic logs?

That's our question to you (and Dario, I guess). But this has nothing to do with the API. This is a special purpose URL path that gets resolved to a special page. So I guess looking at the general purpose web logs should work.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links