Last modified: 2014-04-21 18:45:57 UTC
The timestamp in the filename of the hourly webstatscollector output files refers to the end of the covered period instead of its start. So for example: http://dumps.wikimedia.org/other/pagecounts-raw/2014/2014-02/pagecounts-20140206-110000.gz covers from 2014-02-06 10:00 until 2014-02-06 11:00. This is confusing for me, as I'd instead expect the above file to cover 2014-02-06 11:00 until 2014-02-06 12:00. . From time to time, I see other people tripping over this as well. For example when trying to relate the above file to sampled-1000 tsvs, they'd grep for 2014-02-06T11 in the timestamp field, although they'd have to grep for 2014-02-06T10 instead. Could we either document this clearly on http://dumps.wikimedia.org/other/pagecounts-raw/ or switch the filename to hold the start of the covered period?
Prioritization and scheduling of this bug is tracked on Mingle card https://wikimedia.mingle.thoughtworks.com/projects/analytics/cards/cards/1433