Last modified: 2014-10-22 20:21:40 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T73255, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 71255 - Story: WikimetricsUser downloads large CSV
Story: WikimetricsUser downloads large CSV
Status: RESOLVED FIXED
Product: Analytics
Classification: Unclassified
Wikimetrics (Other open bugs)
unspecified
All All
: High major
: ---
Assigned To: nuria
u=WikimetricsUser c=Wikimetrics p=8 s...
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-09-24 21:58 UTC by Kevin Leduc
Modified: 2014-10-22 20:21 UTC (History)
9 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
Cohort of student editors from fall 2014 on enwiki (22.02 KB, text/csv)
2014-10-10 18:15 UTC, Sage Ross
Details

Comment 1 Kevin Leduc 2014-09-24 22:00:34 UTC
Dan's comment:
This seems related to the size of the download.  I tried downloading a smaller result and it was fine.  The CSV takes longer to generate than the JSON because the native storage format is JSON.  Most likely it would download eventually but it might take an unreasonable amount of time.  The fix would be to optimize the conversion to CSV, maybe go to a streaming converter like Yuvi implemented in Quarry.
Comment 2 Sage Ross 2014-10-10 18:15:08 UTC
Created attachment 16742 [details]
Cohort of student editors from fall 2014 on enwiki

A cohort of 2095 usernames ( 2090 are valid )
Comment 3 Sage Ross 2014-10-10 18:15:52 UTC
In many cases it will not download ever. After many minutes, the user gets a 504 gateway error.

For example: https://metrics.wmflabs.org/reports/result/c04cf328-f198-4a12-81ec-9cb4badefc46.csv

vs. json which works fine: https://metrics.wmflabs.org/static/public/1504496.json

This is a cohort of 2090 users, with individual results for bytes added over a span of about 45 days.

Unfortunately, using JSON is not a viable alternative because it does not include the usernames like CSV does.
Comment 4 Kevin Leduc 2014-10-16 15:34:54 UTC
Collaborative tasking done on etherpad:
http://etherpad.wikimedia.org/p/analytics-71255
Comment 5 Gerrit Notification Bot 2014-10-18 01:36:57 UTC
Change 167356 had a related patch set uploaded by Nuria:
i[WIP] Improving retrieval of user names on cvs report

https://gerrit.wikimedia.org/r/167356
Comment 6 nuria 2014-10-20 15:43:48 UTC
Bug was reproducible running "pages created" with per-user results on the cohort attached to the bug.
Comment 7 nuria 2014-10-20 15:48:30 UTC
Code changes fix issues with performance, now we have to do some refactor as to see whether we can fit similar changes on json report.
Comment 8 nuria 2014-10-20 22:27:27 UTC
Verified in staging that cvs report for 2000 users (with per-user results) gets created in couple seconds. 

Tested a variety of reports with the cohort attched to this bug and all those run in seconds. Made sure to test timeseries report too.
Comment 9 Gerrit Notification Bot 2014-10-22 20:19:23 UTC
Change 167356 merged by Milimetric:
Improves retrieval of user names on csv report

https://gerrit.wikimedia.org/r/167356
Comment 10 Dan Andreescu 2014-10-22 20:21:40 UTC
will be deployed after sprint demo

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links