Last modified: 2014-10-22 20:21:40 UTC
Example: https://metrics.wmflabs.org/reports/result/437b30dd-f535-4d7e-b460-eefabaa07b2a.csv <— won’t download https://metrics.wmflabs.org/reports/result/437b30dd-f535-4d7e-b460-eefabaa07b2a.json
Dan's comment: This seems related to the size of the download. I tried downloading a smaller result and it was fine. The CSV takes longer to generate than the JSON because the native storage format is JSON. Most likely it would download eventually but it might take an unreasonable amount of time. The fix would be to optimize the conversion to CSV, maybe go to a streaming converter like Yuvi implemented in Quarry.
Created attachment 16742 [details] Cohort of student editors from fall 2014 on enwiki A cohort of 2095 usernames ( 2090 are valid )
In many cases it will not download ever. After many minutes, the user gets a 504 gateway error. For example: https://metrics.wmflabs.org/reports/result/c04cf328-f198-4a12-81ec-9cb4badefc46.csv vs. json which works fine: https://metrics.wmflabs.org/static/public/1504496.json This is a cohort of 2090 users, with individual results for bytes added over a span of about 45 days. Unfortunately, using JSON is not a viable alternative because it does not include the usernames like CSV does.
Collaborative tasking done on etherpad: http://etherpad.wikimedia.org/p/analytics-71255
Change 167356 had a related patch set uploaded by Nuria: i[WIP] Improving retrieval of user names on cvs report https://gerrit.wikimedia.org/r/167356
Bug was reproducible running "pages created" with per-user results on the cohort attached to the bug.
Code changes fix issues with performance, now we have to do some refactor as to see whether we can fit similar changes on json report.
Verified in staging that cvs report for 2000 users (with per-user results) gets created in couple seconds. Tested a variety of reports with the cohort attched to this bug and all those run in seconds. Made sure to test timeseries report too.
Change 167356 merged by Milimetric: Improves retrieval of user names on csv report https://gerrit.wikimedia.org/r/167356
will be deployed after sprint demo