Last modified: 2013-11-05 07:43:02 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T58030, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 56030 - http://reportcard.wmflabs.org/ is not updating automatically
http://reportcard.wmflabs.org/ is not updating automatically
Status: ASSIGNED
Product: Analytics
Classification: Unclassified
Visualization (Other open bugs)
unspecified
All All
: Unprioritized normal
: ---
Assigned To: Toby Negrin
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-10-23 01:24 UTC by Tomasz Finc
Modified: 2013-11-05 07:43 UTC (History)
8 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
Report card (103.53 KB, image/jpeg)
2013-10-23 01:24 UTC, Tomasz Finc
Details
November Core Graph showing issue (210.78 KB, image/png)
2013-11-01 23:02 UTC, Tomasz Finc
Details

Description Tomasz Finc 2013-10-23 01:24:24 UTC
Created attachment 13549 [details]
Report card

Loading http://reportcard.wmflabs.org/ shows that were in October but we only have up to date date till August. Screenshot attached. What's blocking us automating this?
Comment 1 Diederik van Liere 2013-10-23 01:26:34 UTC
Prioritization and scheduling of this bug is tracked on Mingle card https://mingle.corp.wikimedia.org/projects/analytics/cards/1230
Comment 2 Dan Andreescu 2013-10-23 18:10:29 UTC
Tomasz, the reportcard works as follows:

The meeting titled "Month X Metrics Meeting" will happen at the beginning of month X.  Therefore, the most recent data it can have is month X-1 pageview data and month X-2 wikistats data.  The wikistats data is resource intensive and can't be computed for month X-1 in time for this meeting.  Also for some months, pageview data doesn't get delivered because there are problems with wikistats data that require manual intervention; in those cases we use month X-2 pageview data.

We have suggested a solution for this, which is to simply move the computation to hadoop.  That migration is tracked by these two epics:

https://mingle.corp.wikimedia.org/projects/analytics/cards/1126
https://mingle.corp.wikimedia.org/projects/analytics/cards/1125

However, as of now, those epics have not been prioritized above our other work.  I think to answer "What's blocking us automating this?", I would just say "Too many other - higher priority - requests for analytics engineering resources, coupled with not enough analytics resources".

If you'd like more info about how hard those epics would be to implement, or what all these other requests are, just ping me in IRC or over email - I'm happy to talk.
Comment 3 Erik Zachte 2013-10-23 19:39:38 UTC
"The wikistats data is resource intensive and can't be computed for month X-1 in time for this meeting." 
This is actually the dumps which take up to three weeks into the next month to arrive. Only a live stream of all relevant tables to hadoop could fix that. 

"to simply move the computation to hadoop" not sure how simple that would be though, I've seen some wild optimism in earlier estimations on hadoop, but for sure this is where we want to go

"Also for some months, pageview data doesn't get delivered because there are problems with wikistats data that require manual intervention; in those cases we use month X-2 pageview data." This is a mistake. Page views counts are updated every day. There is a manual step in Limn (and for some files in Wikistats, but page view data could be sent automated right now). So only when Metrics Meeting is on 1st or 2nd day of the month, latest pageview data don't make in into Limn. Updating Limn after the Metrics Meeting could also help.
Comment 4 Tomasz Finc 2013-11-01 18:45:39 UTC
It's now November and all of our core metrics are still only updated to August. 

When will the current run finish?
Comment 5 Erik Zachte 2013-11-01 21:52:29 UTC
Which core metrics?

Dump based stats are up to date for September. (F5 may be needed)
http://stats.wikimedia.org/EN/Sitemap.htm

(At recent Quartely Analytics Review Meeting I proposed once again to make the dump process smarter, we can and should have stub dumps within days after closure of month) 

Page view stats are updated daily, as they have been for years.
http://stats.wikimedia.org/EN/TablesPageViewsMonthly.htm

Squid log based reports are up to date to September
http://stats.wikimedia.org/wikimedia/squids/SquidReportRequests.htm

Geo based reports likewise
http://stats.wikimedia.org/wikimedia/squids/SquidReportPageViewsPerCountryOverview2013Q3.htm

For October (which was ongoing) we need to do some serious investigating, seems there were external issues with our ip->geo lookup. We just found out about that.
Comment 6 Tomasz Finc 2013-11-01 22:58:48 UTC
(In reply to comment #5)
> Which core metrics?

The default "Core" tab referred to in the url of this bug
Comment 7 Tomasz Finc 2013-11-01 23:02:44 UTC
Created attachment 13664 [details]
November Core Graph showing issue
Comment 8 Erik Zachte 2013-11-01 23:21:51 UTC
Comment on attachment 13664 [details]
November Core Graph showing issue

Ah, comScore stats are published around 20th of the month. Two people need to take some manual steps to get this into Limn. I prep all data for Limn in one go for efficiency sake. If someone with less backlog (BTW I work 1/3 FTE) wants to take over I'd be happy to explain how to do this.
Comment 9 Tomasz Finc 2013-11-01 23:26:03 UTC
(In reply to comment #8)
> Comment on attachment 13664 [details]
> November Core Graph showing issue
> 
> Ah, comScore stats are published around 20th of the month. Two people need to
> take some manual steps to get this into Limn. I prep all data for Limn in one
> go for efficiency sake. If someone with less backlog (BTW I work 1/3 FTE)
> wants
> to take over I'd be happy to explain how to do this.

Thanks for the info. All graphs on the page run into this problem including our own data sets. comScore just happens to be at the top.
Comment 10 Toby Negrin 2013-11-02 00:00:47 UTC
This was brought up at the Analytics Quarterly review and we are working with Ken/ops to address the root cause.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links