Last modified: 2012-11-16 17:48:12 UTC
When processing a job, each job runner removes the job when it starts working on it. If the process is killed while working on a job, that job will never get picked up again. see the pop() function in includes/job/JobQueue.php. The job runners should mark a job 'in progress' when they get the lock on a job and only remove it from the queue when it is successfully completed. Additionally, a sweeper process should pick up jobs that were started but never finished and re-insert them into the queue (or flag them as possibly broken).
Possibly just have a timestamp of like last touched, null on insert When it's checked out, set the timestamp, for the "sweeper", maybe if it still hasn't finished within X time, set it back to null, so it can be done again?
Partly done with https://gerrit.wikimedia.org/r/#/c/13194/. We still need to reset the token/timestamp for stale rows so they can be popped though.
https://gerrit.wikimedia.org/r/#/c/29736/