Last modified: 2012-11-16 17:48:12 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T32165, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 30165 - Job queue can lose jobs if job runners are killed
Job queue can lose jobs if job runners are killed
Status: RESOLVED FIXED
Product: MediaWiki
Classification: Unclassified
JobQueue (Other open bugs)
unspecified
All All
: Normal normal (vote)
: ---
Assigned To: Aaron Schulz
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-08-01 17:09 UTC by Ben Hartshorne
Modified: 2012-11-16 17:48 UTC (History)
3 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Ben Hartshorne 2011-08-01 17:09:55 UTC
When processing a job, each job runner removes the job when it starts working on it.  If the process is killed while working on a job, that job will never get picked up again.  see the pop() function in includes/job/JobQueue.php.

The job runners should mark a job 'in progress' when they get the lock on a job and only remove it from the queue when it is successfully completed.  Additionally, a sweeper process should pick up jobs that were started but never finished and re-insert them into the queue (or flag them as possibly broken).
Comment 1 Sam Reed (reedy) 2011-08-01 17:15:04 UTC
Possibly just have a timestamp of like last touched, null on insert

When it's checked out, set the timestamp, for the "sweeper", maybe if it still hasn't finished within X time, set it back to null, so it can be done again?
Comment 2 Aaron Schulz 2012-10-17 16:57:34 UTC
Partly done with https://gerrit.wikimedia.org/r/#/c/13194/.

We still need to reset the token/timestamp for stale rows so they can be popped though.
Comment 3 Aaron Schulz 2012-11-16 17:48:12 UTC
https://gerrit.wikimedia.org/r/#/c/29736/

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links