Last modified: 2014-04-02 13:03:03 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T49724, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 47724 - mw/core gating jobs are delayed during peak hours
mw/core gating jobs are delayed during peak hours
Status: RESOLVED FIXED
Product: Wikimedia
Classification: Unclassified
Continuous integration (Other open bugs)
wmf-deployment
All All
: Normal normal (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on: 48419
Blocks:
  Show dependency treegraph
 
Reported: 2013-04-26 11:31 UTC by Antoine "hashar" Musso (WMF)
Modified: 2014-04-02 13:03 UTC (History)
3 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Antoine "hashar" Musso (WMF) 2013-04-26 11:31:29 UTC
On Thursday April 25th, the mw/core.git Jenkins jobs have been slow to report back in Gerrit. Example change: https://gerrit.wikimedia.org/r/#/c/60765/ which took a good 20 minutes for gate-and-submit to report back.

I suspect there were over jobs running at that time such as the parser tests which takes a good 5 minutes to run. So if you have lot of changes submitted (like X of them ), the most recent change would be run (X+1) * 5 minutes after submission.

This is due to the mediawiki-core-phpunit-parser job being shared among all pipelines. The patchsets sent (test pipeline) and the one asked to be merged (gate-and-submit pipeline) ends up racing for an execution slot in Jenkins.
Comment 1 Antoine "hashar" Musso (WMF) 2013-04-26 11:54:16 UTC
The change 60765 got reported at 21:49 UTC. 


The build start time around that time:

$ cd /var/lib/jenkins/jobs/mediawiki-core-phpunit-parser/builds
$ grep -o -P 'ZUUL_PIPELINE=[\w-]+' 2013-04-25_2*/log
2013-04-25_20-01-46/log:ZUUL_PIPELINE=gate-and-submit
2013-04-25_20-05-11/log:ZUUL_PIPELINE=gate-and-submit
2013-04-25_20-16-14/log:ZUUL_PIPELINE=gate-and-submit
2013-04-25_20-30-57/log:ZUUL_PIPELINE=test
2013-04-25_20-35-53/log:ZUUL_PIPELINE=gate-and-submit
2013-04-25_20-39-39/log:ZUUL_PIPELINE=test
2013-04-25_20-43-45/log:ZUUL_PIPELINE=test
2013-04-25_20-53-27/log:ZUUL_PIPELINE=gate-and-submit
2013-04-25_21-06-31/log:ZUUL_PIPELINE=gate-and-submit
2013-04-25_21-12-16/log:ZUUL_PIPELINE=gate-and-submit
2013-04-25_21-27-54/log:ZUUL_PIPELINE=test
2013-04-25_21-34-54/log:ZUUL_PIPELINE=gate-and-submit
2013-04-25_21-41-44/log:ZUUL_PIPELINE=gate-and-submit
2013-04-25_21-46-01/log:ZUUL_PIPELINE=gate-and-submit
2013-04-25_21-50-01/log:ZUUL_PIPELINE=gate-and-submit
2013-04-25_21-54-25/log:ZUUL_PIPELINE=gate-and-submit
2013-04-25_21-59-02/log:ZUUL_PIPELINE=test
2013-04-25_22-05-25/log:ZUUL_PIPELINE=gate-and-submit
2013-04-25_22-09-11/log:ZUUL_PIPELINE=test
2013-04-25_22-25-30/log:ZUUL_PIPELINE=test
2013-04-25_22-42-05/log:ZUUL_PIPELINE=test
2013-04-25_23-26-14/log:ZUUL_PIPELINE=test
2013-04-25_23-48-50/log:ZUUL_PIPELINE=gate-and-submit
$

As we can see, a lot of builds have been done in a short amount of time. The gate-and-submit have been done in both master and wmf branches.

Maybe I should switch Zuul to uses a DependentPipelineManager for gate-and-submit. That will only run tests for the most recent gated changed and merge them all if the test succeed.
Comment 2 Antoine "hashar" Musso (WMF) 2013-05-21 09:09:35 UTC
The gate-and-submit pipeline should be made a DependentPipeline which is bug 48419.

Also Zuul is spamming Gerrit with changes request. https://review.openstack.org/#/c/27411/
Comment 3 Antoine "hashar" Musso (WMF) 2013-06-28 10:39:19 UTC
This is still happening, specially around 10pm (CET) when the l10nbot sends half a thousand of changes.
Comment 4 Antoine "hashar" Musso (WMF) 2013-09-03 12:38:12 UTC
This is less of an issue right now, the merge jobs are no more triggered which helped with the spike load.
Comment 5 Antoine "hashar" Musso (WMF) 2014-04-02 13:03:03 UTC
Got fixed by several changes, namely:

- l10nbot is no more triggering jobs
- jobs now run concurrently
- we have more slaves in Jenkins
- Zuul got upgraded and reports faster

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links