Last modified: 2013-04-15 14:51:13 UTC
Take a look at: http://en.wikipedia.org/wiki/Wikipedia:Village_pump_(technical)#Sudden_drop_in_pageviews For at least two separate, unrelated Wikipedia pages, the reported number of pageviews abruptly dropped by more than 75% near the end of March, 2013, and has not recovered. Simple investigations have failed to find a cause.
Hi David, Could you please elaborate on the investigations that were conducted? I read the Village Pump page but it does not mention any investigation. Thanks! Diederik
I posted the following explanation on the Village Pump (http://en.wikipedia.org/wiki/Wikipedia:Village_pump_(technical)#Sudden_drop_in_pageviews): Explanation: On March 25th, the Analytics Team removed SSL traffic from the udp2log stream of webrequests. This webrequest stream is consumed by webstatscollector, the tool that generates the data that is presented by stats.grok.se. The reason we removed SSL traffic was twofold: * Each logline is tagged with a unique number that allows us to see how much loglines we lose (aka packetloss); this numbering system was not working for SSL traffic and hence our packetloss monitoring was inadequate. You can see a nice drop in packetloss reporting as a result of this fix. * SSL traffic actually generates two hits in our log files, once when it hits the SSL server (nginx) and the second time when it hits the cache server (squid). Webstatscollector was not deduplicating these numbers and so actually the drop in pageviews that we are seeing means that we have gone back to the actual pageview count. So removing SSL traffic from the main webrequest stream was the cause of this drop but it did not introduce a bug, it actually fixed an unknown bug of overreporting SSL generated pageviews. Thanks to Wikid77 who got me thinking about the SSL cause in the first place. Potential Next Steps: # WMF only recently started to enable SSL traffic so it would be interesting to see if we can find a sudden spike in pageviews for this article. # We can try to get Google link to the non-SSL page; it should not impact the numbers anymore but WMF's infrastructure is not quite ready for handling massive volumes of SSL traffic for anonymous readers.