Last modified: 2014-01-24 00:35:12 UTC
The job queue is way more awesome than it was 2 years ago. With the improved code + redis architecture, it's incredibly reliable and we're doing way more jobs than ever before. We tend to keep up with the small day-to-day jobs perfectly. Most queues are near empty on enwiki most of the time, or in the case of cirrus/htmlCache/linksUpdate jobs, maybe a few hundred/thousand at a time. No big deal. What we do *not* handle well is a large burst of jobs. Someone edits a super high use template, we reindex all of enwiki or commons in Cirrus, anything. We end up with millions of jobs and it takes weeks to clear the backlog without manual intervention. It would be nice to do something better in this case. I have no clue what this better thing may be.
Putting giant bulk operations onto their own subqueues and interleaving them with other actions might be good.