Last modified: 2014-04-14 19:28:06 UTC
See folder http://dumps.wikimedia.org/other/pagecounts-ez/merged/ Monthly files for 2014-02 and 2014-03 are 1.7/2.1 GB instead of usual 4.5 GB Alex Druk: I compared Jan and Mar aggregated data files. As you can see from enclosed data for many projects (eo-ps) are missing in March. Always ready to help...
Prioritization and scheduling of this bug is tracked on Mingle card https://wikimedia.mingle.thoughtworks.com/projects/analytics/cards/cards/1540
Problem analyzed: earlier this week I re-enabled job dammit_compact_daily.sh which had not run since dumps server got migrated. So it had to a lengthy update cycle, generating some 20 daily files. After all daily dumps have been generated the monthly aggregation script dammit_compact_monthly.sh is invoked. This should only find work to do once a month. But because dammit_compact_daily.sh had so much catching up to do the last step dammit_compact_monthly.sh was still running 24 hrs later, when the next daily cron job was started. This did not find the monthly files and also started the monthly aggregation phase. Clearly this monthly step should be protected against multiple instances.
Protected dammit_compact_daily.sh and dammit_compact_monthly.sh against multiple concurrent invocations with flock
Hi Erik -- how can we confirm this is fixed? IIRC you confirmed that this fix worked for a separate bug. thanks, -Toby
Toby, the files have been regenerated properly. And I tested the new shielding with 'flock' against concurrent runs. So I will close this bug now. Cheers, Erik