Last modified: 2014-04-02 13:03:03 UTC
On Thursday April 25th, the mw/core.git Jenkins jobs have been slow to report back in Gerrit. Example change: https://gerrit.wikimedia.org/r/#/c/60765/ which took a good 20 minutes for gate-and-submit to report back. I suspect there were over jobs running at that time such as the parser tests which takes a good 5 minutes to run. So if you have lot of changes submitted (like X of them ), the most recent change would be run (X+1) * 5 minutes after submission. This is due to the mediawiki-core-phpunit-parser job being shared among all pipelines. The patchsets sent (test pipeline) and the one asked to be merged (gate-and-submit pipeline) ends up racing for an execution slot in Jenkins.
The change 60765 got reported at 21:49 UTC. The build start time around that time: $ cd /var/lib/jenkins/jobs/mediawiki-core-phpunit-parser/builds $ grep -o -P 'ZUUL_PIPELINE=[\w-]+' 2013-04-25_2*/log 2013-04-25_20-01-46/log:ZUUL_PIPELINE=gate-and-submit 2013-04-25_20-05-11/log:ZUUL_PIPELINE=gate-and-submit 2013-04-25_20-16-14/log:ZUUL_PIPELINE=gate-and-submit 2013-04-25_20-30-57/log:ZUUL_PIPELINE=test 2013-04-25_20-35-53/log:ZUUL_PIPELINE=gate-and-submit 2013-04-25_20-39-39/log:ZUUL_PIPELINE=test 2013-04-25_20-43-45/log:ZUUL_PIPELINE=test 2013-04-25_20-53-27/log:ZUUL_PIPELINE=gate-and-submit 2013-04-25_21-06-31/log:ZUUL_PIPELINE=gate-and-submit 2013-04-25_21-12-16/log:ZUUL_PIPELINE=gate-and-submit 2013-04-25_21-27-54/log:ZUUL_PIPELINE=test 2013-04-25_21-34-54/log:ZUUL_PIPELINE=gate-and-submit 2013-04-25_21-41-44/log:ZUUL_PIPELINE=gate-and-submit 2013-04-25_21-46-01/log:ZUUL_PIPELINE=gate-and-submit 2013-04-25_21-50-01/log:ZUUL_PIPELINE=gate-and-submit 2013-04-25_21-54-25/log:ZUUL_PIPELINE=gate-and-submit 2013-04-25_21-59-02/log:ZUUL_PIPELINE=test 2013-04-25_22-05-25/log:ZUUL_PIPELINE=gate-and-submit 2013-04-25_22-09-11/log:ZUUL_PIPELINE=test 2013-04-25_22-25-30/log:ZUUL_PIPELINE=test 2013-04-25_22-42-05/log:ZUUL_PIPELINE=test 2013-04-25_23-26-14/log:ZUUL_PIPELINE=test 2013-04-25_23-48-50/log:ZUUL_PIPELINE=gate-and-submit $ As we can see, a lot of builds have been done in a short amount of time. The gate-and-submit have been done in both master and wmf branches. Maybe I should switch Zuul to uses a DependentPipelineManager for gate-and-submit. That will only run tests for the most recent gated changed and merge them all if the test succeed.
The gate-and-submit pipeline should be made a DependentPipeline which is bug 48419. Also Zuul is spamming Gerrit with changes request. https://review.openstack.org/#/c/27411/
This is still happening, specially around 10pm (CET) when the l10nbot sends half a thousand of changes.
This is less of an issue right now, the merge jobs are no more triggered which helped with the spike load.
Got fixed by several changes, namely: - l10nbot is no more triggering jobs - jobs now run concurrently - we have more slaves in Jenkins - Zuul got upgraded and reports faster