Last modified: 2012-06-08 18:41:55 UTC
The job-loop triggers calls to MediaWiki maintenance/runJobs.php script. For some reason, the processes never ends and eat up all CPU. They are jobs like: mwscript runJobs.php --wiki=commonswiki --procs=5 & Aka there is no type. The commonswiki job table had two job requests for webVideoTranscode : (mw@deployment-sql) [commonswiki]> select * from job \G *************************** 1. row *************************** job_id: 1917 job_cmd: webVideoTranscode job_namespace: 6 job_title: Mayday2012-edit-1.ogv job_timestamp: 20120523195317 job_params: a:2:{s:13:"transcodeMode";s:10:"derivative";s:12:"transcodeKey";s:8:"160p.ogv";} *************************** 2. row *************************** job_id: 1918 job_cmd: webVideoTranscode job_namespace: 6 job_title: Mayday2012-edit-1.ogv job_timestamp: 20120523195317 job_params: a:2:{s:13:"transcodeMode";s:10:"derivative";s:12:"transcodeKey";s:9:"480p.webm";} 2 rows in set (0.00 sec) (mw@deployment-sql) [commonswiki]> So it seems the runJobs.php script keep looping forever trying to achieves the jobs. Deleting the jobs solve the looping issue: (mw@deployment-sql) [commonswiki]> delete from job; Query OK, 2 rows affected (0.38 sec)
Find out: - why triggered job never ends up running - why despite having only 2 jobs, there is several forked process - why job stick in the queue
Looks like job::pop() fail to delete the jobs from the database :-(
I found the root cause while sleeping this week-end. The cause is that transcode jobs are excluded from being processed by runJobs.php (through the use of $wgJobTypesExcludedFromDefaultQueue) whereas nextJobDB.php still consider those jobs as in need of processing. End result is an infinite loop since jobs are never processed. Hence the addition of $wgJobTypesExcludedFromDefaultQueue, by commit 45f9da8ad7, need to be enhanced.
Raising priority as a remember to get that reviewed asap. It causes disruptions on deployment-prep . Patch to MW Core: https://gerrit.wikimedia.org/r/9116
Gerrit change #9116, which fixed nextJobDB.php, has been merged in. A similar issue is occurring with runJobs.php which also can lead to an infinite loop. Proposed change is: https://gerrit.wikimedia.org/r/10692
Both patches merged. I have them applied to the beta cluster and there is no more infinite loop issue.