Last modified: 2014-03-11 16:04:26 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T48265, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 46265 - Erik Zachte's list of discrepancies for old vs new traffic reports
Erik Zachte's list of discrepancies for old vs new traffic reports
Status: NEW
Product: Analytics
Classification: Unclassified
Wikistats (Other open bugs)
unspecified
All All
: Low normal
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-03-18 12:23 UTC by Erik Zachte
Modified: 2014-03-11 16:04 UTC (History)
1 user (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Erik Zachte 2013-03-18 12:23:50 UTC
migrated from Asana 
bug is partially solved, all comment 'as' is' copied from Asana

Stefan Petrea created task.Dec 19, 2012
Stefan Petrea Report SquidReportRequests.htm:
 
1) One issue:
 
old totals: text/vnd.wap.wml 72M, application/x-www-form-urlencoded total 26.2 M
new totals: text/vnd.wap.wml 26.2, application/x-www-form-urlencoded total 72 M
 
so values have miraculously been swapped
I have no idea if old or new report is telling the truth here
 
-----------------------------------------------
Report SquidReportOrigins.htm:
 
Section: Requests with external origins:
2) new report shows strange domains, like 2620-, 2001-, 2a01, etc
 
3)
totals old report: 136,377 M, pages 6,828 M, images 107,737 M, other 21,812 M
totals new report: 137,541 M, pages 6,829 M, images 107,745 M, other 22,968 M
I'd say 'other' is a significant difference between old and new, worth investigating
 
-----------------------------------------------
Report SquidReportDevices.htm
 
4) PM: broken in old and new (is another bug, can be addressed separately)
 
-----------------------------------------------
 
Report SquidReportClients.htm
 
5) many discrepancies:
 
old has much more tablets (3.35% vs 0.71% in new report)
 
Safari/Android/Mozilla are missing in new report on 'browsers, tablets'
Safari on 'browsers, other mobile' is 5.73% and on new 8.13%
Safari occurs twice in old report in section 'browsers, other mobile'
Totals for 'Mobile applications' is much higher (10x) in old report, some entries are missing in new report 
'Browser version, tablet' major mismatches between old and new
'Mobile application versions' likewise
 
-----------------------------------------------
 
Report SquidReportGoogle.htm
 
6) GoogleBot?/Other is 21.8 M in old report / 28.4 M in new (Googlebot stands for probable imposters, using agent string GoogleBot but from unexpected ip addres)
 
-----------------------------------------------
 
Report SquidReportCountryData.htm
 
7) new report: Global North/South Android is much less on new report And N+S don't add up to global total, not even close (in sections 'All pageviews' and 'To mobile site')
 
8) again column tablets is way out of sync, old values 5 times as much as new , but that is just another showing of bug already mentioned on other report
 
Dec 19, 2012 at 3:41pm • 
Stefan Petrea marked today.Dec 19, 2012
Stefan Petrea 1,2,7 is solved and 5 is underway

Dec 19, 2012 at 3:43pm • 
Stefan Petrea 5 solved as well
Dec 20, 2012 at 6:36pm • 
Stefan Petrea waiting for review https://gerrit.wikimedia.org/r/#/c/39587/

Dec 20, 2012 at 6:36pm • 
Stefan Petrea 4 solved as well(through ^^ gerrit patchset) MobileDeviceTypes.csv was being read from an non-existing path (due to some csv files being moved to csv/meta)

Dec 20, 2012 at 7:06pm • 
Stefan Petrea Some of the totals in SquidreportClients.htm are >100% , I need to fix that, started writing tests for it. I was able to identify a small dataset to reproduce the problem. Solution is underway.

Dec 21, 2012 at 4:46pm • 
Stefan Petrea I have also discussed with Erik about what each of the totals in SquidReportClients.htm mean. At the moment my understanding is that some of the knowledge related to those totals was lost. We will have to do something to get it back. Options are digging through code, contacting a former Director of Mobile, or re-defining them so we can document them and after that we know what the all the numbers mean.… See More
Dec 21, 2012 at 4:48pm • 
Stefan Petrea Forgot to mention that they're not obvious for me. If we can have another discussion on them that would be great. I will be digging through the code in the meanwhile.
Dec 21, 2012 at 4:50pm • 
Erik Zachte Yes, It is puzzling for me as well. Here is the gist of what we discussed last night:

Till about a year ago the reports was pretty intuitive (Stefan agreed on this):

http://stats.wikimedia.org/archive/squid_reports/2011-08/SquidReportClients.htm
There was 
list 'Browsers, non mobile'            html 87.6%
list 'Browsers, mobile'                   html 14.2%
list 'Browser version, non mobile' html… See More

Dec 21, 2012 at 5:22pm
Stefan Petrea unmarked today.Feb 22
Comment 1 Erik Zachte 2014-03-11 16:04:26 UTC
These new versions of the reports were reverted more than a year ago, as no-one could muster the time to fix open issues. Right now maintenance on squid log based reports is still on hold, pending replacement or restructuring in a HADOOP context.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links