Last modified: 2014-02-06 12:47:26 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T59851, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 57851 - Merging hourly pagecount files fails most days since a few weeks.
Merging hourly pagecount files fails most days since a few weeks.
Status: RESOLVED FIXED
Product: Analytics
Classification: Unclassified
Wikistats (Other open bugs)
unspecified
All All
: Unprioritized normal
: ---
Assigned To: Erik Zachte
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-12-02 17:39 UTC by Erik Zachte
Modified: 2014-02-06 12:47 UTC (History)
4 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Erik Zachte 2013-12-02 17:39:35 UTC
Christian:

I just noticed that the November directory of the pagecounts-ez/merged files at:

  http://dumps.wikimedia.org/other/pagecounts-ez/merged/2013/2013-11/

looks wrong. There are so many files ending in ".~" instead of ".bz2".
Also the timestamps differ from previous months. So for example each of the files in

  http://dumps.wikimedia.org/other/pagecounts-ez/merged/2013/2013-09/

have been created on the day following the date in the file name.

P.S.: I noticed that the problem seems to have started in October:

  http://dumps.wikimedia.org/other/pagecounts-ez/merged/2013/2013-10/

There the 2013-10-24 file is not a ".bz2", but ".~".
That date struck me. Although it's probably completely unrelated, we had (for first time) a strange log line in the zero logs at that same day. There the timestamp of a log line has been mangled [1].
We're seeing such requests more and more these days.



[1]
___________________________________________________________
qchris@stat1002 // 0 // 20:18:05
cwd: ~
zcat /a/squid/archive/zero/zero.tsv.log-20131024 | cut -f 3 | grep -C 5 201cp3011
2013-10-23T13:29:23
2013-10-23T13:29:23
2013-10-23T13:29:23
2013-10-23T13:29:23
2013-10-23T13:29:23
201cp3011.esams.wikimedia.org
2013-10-23T13:29:24
2013-10-23T13:29:24
2013-10-23T13:29:24
2013-10-23T13:29:24
2013-10-23T13:29:24
Comment 1 Diederik van Liere 2013-12-02 17:43:05 UTC
Prioritization and scheduling of this bug is tracked on Mingle card https://mingle.corp.wikimedia.org/projects/analytics/cards/1287
Comment 2 Toby Negrin 2013-12-19 00:59:28 UTC
Any idea of the impact of this issue? Is it a problem?

-Toby
Comment 3 Erik Zachte 2013-12-19 08:42:52 UTC
Low impact. But I will fix in the new year. In the meantime monthly totals are extrapolated from remaining days. And once fixed all missing files can be recreated from permanently stored raw data.
Comment 4 christian 2013-12-19 14:35:56 UTC
It certainly is a problem for me; I used those files several times.
(E.g. to understand the data we're seeing, to understand webstatscollector,
to understand pageviews)

Of course, I can run the aggregations myself upon need, but that means a huge
delay and waste of time :-/

Besides it is a public set of daily data that has not been updated since
~1.5 months :-(
Comment 5 christian 2013-12-19 15:55:38 UTC
Not even 2 hours past comment #4 and I would already have needed
the data again :-)

I've just been pointed towards bug #58316. As we do not see the \x
hits in the sampled logs, I would naturally use Erik's merged files
to see if the problem is webstatscollector related.
Falling back to doing it by hand. Meh.
Comment 6 Erik Zachte 2013-12-19 16:24:45 UTC
Sorry, I did not realize you use it that often. I will look at in the coming days.
Comment 7 christian 2013-12-20 11:27:09 UTC
(In reply to comment #6)
> Sorry, I did not realize you use it that often. I will look at in the coming
> days.

Sorry, my point was not to mess with your scheduling. Not at all!
I just wanted to show that the data indeed gets used.

It's perfectly fine by me if we fix it early 2014.
Comment 8 christian 2014-02-06 12:47:26 UTC
The daily files come in as expected again.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links