Last modified: 2014-08-27 13:49:11 UTC
If labsdb has replication lag (which I'm not 100% sure is possible), then recurrent reports might not see all the data for a given day when they run. That means that data would go missing from any run, because reports use database timestamps and recurrent report runs use actual time. Example: * day X starts * edit 1 * edit 2 * replication lag means the rest of day X is not on labsdb yet * edit 3 * day X ends * recurrent report runs for day X, gets data from labsdb, and doesn't see edit 3. ... * recurrent report runs for day X+1 and the timestamp range filter means it doesn't see edit 3. result: day X edit totals are off and never corrected.
Collaborative tasking on etherpad: http://etherpad.wikimedia.org/p/analytics-68507
Per conversation with springle this table will be in db information_schema_p which is present of every host and has open access. Information on lag will be reported per shard (s1, s2...)
As the title limits to recurrent reports (although the same issue also affects non-recurrent reports), I am assuming that we really only need to cover recurrent reports.
That's a fine assumption. The mechanism we use to determine replag could be reused later. And for now, people running ad-hoc reports are probably used to replag the same way that people run ad-hoc queries.
Change 154267 had a related patch set uploaded by QChris: Reschedule recurring reports to 03:00 https://gerrit.wikimedia.org/r/154267
Change 154267 merged by Milimetric: Reschedule recurring reports to 03:00 https://gerrit.wikimedia.org/r/154267
Change 155003 had a related patch set uploaded by QChris: Stop scheduling new recurrent runs if databases lag https://gerrit.wikimedia.org/r/155003
Change 155003 merged by Milimetric: Stop scheduling new recurrent runs if databases lag https://gerrit.wikimedia.org/r/155003
All relevant changes have been merged.