Last modified: 2014-08-18 14:15:59 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T70731, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 68731 - Backing up wikimetrics data fails if data is written while we back it up
Backing up wikimetrics data fails if data is written while we back it up
Status: RESOLVED FIXED
Product: Analytics
Classification: Unclassified
Wikimetrics (Other open bugs)
unspecified
All All
: Highest enhancement
: ---
Assigned To: christian
u=AnalyticsEng c=Wikimetrics p=5 s=20...
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-07-28 12:20 UTC by christian
Modified: 2014-08-18 14:15 UTC (History)
7 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description christian 2014-07-28 12:20:46 UTC
The run of the hourly script for 2014-07-28 05:00 failed with

  tar: /var/lib/wikimetrics/public/69987: file changed as we read it
  Error: Either failed to get lock on /data/project/wikimetrics/backup/wikimetrics1/hourly, or tar-ing failed.

I checked locks, and they were properly cleaned up. So it seems the issue
was only that the file was written while we tried to tar it up.

Since we expect more writing over time, should we guard against this
from happening again?
Comment 1 christian 2014-07-28 15:41:10 UTC
It happened again for the 2014-07-28 14:00 run:

  tar: /var/lib/wikimetrics/public/69989: file changed as we read it
  tar: /var/lib/wikimetrics/public/69987: file changed as we read it
  Error: Either failed to get lock on /data/project/wikimetrics/backup/wikimetrics1/hourly, or tar-ing failed.

While the bug is of course valid as is, I'll stop reporting further
instances for now, as it seems the wikimetrics1 is having
more severe issues (bug 68743).
Comment 2 Kevin Leduc 2014-08-08 14:47:54 UTC
collaboratively tasked on etherpad: http://etherpad.wikimedia.org/p/analytics-68731
Comment 3 Gerrit Notification Bot 2014-08-11 09:45:33 UTC
Change 153388 had a related patch set uploaded by QChris:
Reschedule backups to not interfer with queue runs so easily

https://gerrit.wikimedia.org/r/153388
Comment 4 Gerrit Notification Bot 2014-08-11 12:51:56 UTC
Change 153395 had a related patch set uploaded by QChris:
Force redis dump before backing up

https://gerrit.wikimedia.org/r/153395
Comment 5 Gerrit Notification Bot 2014-08-12 09:16:06 UTC
Change 153568 had a related patch set uploaded by QChris:
Make hourly backup keep around known-good full backups in case of issues

https://gerrit.wikimedia.org/r/153568
Comment 6 Gerrit Notification Bot 2014-08-14 18:55:24 UTC
Change 153388 merged by Ottomata:
Reschedule backups to not interfer with queue runs so easily

https://gerrit.wikimedia.org/r/153388
Comment 7 Gerrit Notification Bot 2014-08-15 16:52:56 UTC
Change 153568 merged by Ottomata:
Make hourly backup keep around known-good full backups in case of issues

https://gerrit.wikimedia.org/r/153568
Comment 8 Gerrit Notification Bot 2014-08-15 17:45:30 UTC
Change 153395 merged by Ottomata:
Force redis dump before backing up

https://gerrit.wikimedia.org/r/153395
Comment 9 nuria 2014-08-15 19:00:32 UTC
Tested throughly on dev but this of course needs baking time in prod. Wish we had a status "READY_TO_DEPLOY" that should be how bugs are left at the end of sprint.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links