Last modified: 2013-04-15 14:51:13 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T49227, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 47227 - sudden drop in pageviews
sudden drop in pageviews
Status: RESOLVED WORKSFORME
Product: Analytics
Classification: Unclassified
General/Unknown (Other open bugs)
unspecified
All All
: Unprioritized normal
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-04-15 02:49 UTC by David Williams
Modified: 2013-04-15 14:51 UTC (History)
2 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description David Williams 2013-04-15 02:49:52 UTC
Take a look at:

http://en.wikipedia.org/wiki/Wikipedia:Village_pump_(technical)#Sudden_drop_in_pageviews

For at least two separate, unrelated Wikipedia pages, the reported number of pageviews abruptly dropped by more than 75% near the end of March, 2013, and has not recovered. Simple investigations have failed to find a cause.
Comment 1 Diederik van Liere 2013-04-15 03:08:29 UTC
Hi David,

Could you please elaborate on the investigations that were conducted? I read the Village Pump page but it does not mention any investigation.

Thanks!
Diederik
Comment 2 Diederik van Liere 2013-04-15 14:51:13 UTC
I posted the following explanation on the Village Pump (http://en.wikipedia.org/wiki/Wikipedia:Village_pump_(technical)#Sudden_drop_in_pageviews):

Explanation: On March 25th, the Analytics Team removed SSL traffic from the udp2log stream of webrequests. This webrequest stream is consumed by webstatscollector, the tool that generates the data that is presented by stats.grok.se. The reason we removed SSL traffic was twofold:
* Each logline is tagged with a unique number that allows us to see how much loglines we lose (aka packetloss); this numbering system was not working for SSL traffic and hence our packetloss monitoring was inadequate. You can see a nice drop in packetloss reporting as a result of this fix.
* SSL traffic actually generates two hits in our log files, once when it hits the SSL server (nginx) and the second time when it hits the cache server (squid). Webstatscollector was not deduplicating these numbers and so actually the drop in pageviews that we are seeing means that we have gone back to the actual pageview count.

So removing SSL traffic from the main webrequest stream was the cause of this drop but it did not introduce a bug, it actually fixed an unknown bug of overreporting SSL generated pageviews. Thanks to Wikid77 who got me thinking about the SSL cause in the first place.

Potential Next Steps:
# WMF only recently started to enable SSL traffic so it would be interesting to see if we can find a sudden spike in pageviews for this article.
# We can try to get Google link to the non-SSL page; it should not impact the numbers anymore but WMF's infrastructure is not quite ready for handling massive volumes of SSL traffic for anonymous readers.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links