Last modified: 2013-05-06 19:20:39 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T48176, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 46176 - [Worked around] Zuul slow to report back to Gerrit
[Worked around] Zuul slow to report back to Gerrit
Status: RESOLVED FIXED
Product: Wikimedia
Classification: Unclassified
Continuous integration (Other open bugs)
unspecified
All All
: Lowest major (vote)
: ---
Assigned To: Antoine "hashar" Musso (WMF)
:
Depends on: 46354
Blocks:
  Show dependency treegraph
 
Reported: 2013-03-15 22:02 UTC by Antoine "hashar" Musso (WMF)
Modified: 2013-05-06 19:20 UTC (History)
4 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Antoine "hashar" Musso (WMF) 2013-03-15 22:02:53 UTC
For a few days now, Zuul has been lagging out to report completed builds back in Gerrit. There are most probably different root causes:

- When submitting a change, Zuul is locked, if Gerrit is slow to merge the whole process is locked down until the change is merged
- Zuul does not seem to recognize the LOST builds properly, specially if it is the last of a set of jobs.  It seems to consider the change to be still around but does not bother reporting it since it is not FAIL nor SUCCESS
- Zuul did a ton of git remote update, I have reverted that patch an hour ago.

Usually Zuul become stuck between 8pm and 11pm GMT, which is the busy hours. European volunteers are very active, i18n bot is sending lot of patches and San Francisco is having a productive morning.

The signes of slowness are:
- https://integration.wikimedia.org/zuul/status has lot of changes with all build completed
- jenkins takes a long time to report back to gerrit even for very simple checks (such as the one on operations/puppet.git or translatewiki.net).


I have no idea what the fix is but upgrading Zuul is probably going to help. The new version of Zuul depends on a python module which is not available in Ubuntu Precise, I have packaged it and its pending review/merge/deploying (see bug 44061).
Comment 1 Antoine "hashar" Musso (WMF) 2013-03-18 20:41:18 UTC
On March 18th, it took roughly 1 hour and 10 minutes to have a build report for https://gerrit.wikimedia.org/r/#/c/54513/ . The jobs have been completed successfully a few minutes after patch submission but then they stayed in the status queue until being reported.
Comment 2 Antoine "hashar" Musso (WMF) 2013-03-18 20:44:50 UTC
Might be fixed by upstream patch https://review.openstack.org/#/c/23117/
Comment 3 Antoine "hashar" Musso (WMF) 2013-03-18 21:31:04 UTC
I will cherry pick that patch from upstream and get it deployed when Zuul/Gerrit is quiet (aka during European morning).
Comment 4 Antoine "hashar" Musso (WMF) 2013-03-19 08:41:57 UTC
git cherry-pick 263fba9
git push wikimedia HEAD:master

I have deployed the new Zuul version which is upstream ff79197 + the patch "Give the result event queue priority.".  The current sha1 is e9d929a.

That will most probably fix the issue, I will monitor that during the next rush hours.
Comment 5 Antoine "hashar" Musso (WMF) 2013-03-20 11:53:49 UTC
The issue has been worked around with upstream cherry pick. I am thus lowering priority.

This bug will be closed whenever Zuul is upgraded (bug 46354)
Comment 6 Antoine "hashar" Musso (WMF) 2013-03-29 21:44:26 UTC
Lowest priority since we have a workaround.
Comment 7 Antoine "hashar" Musso (WMF) 2013-04-24 09:13:40 UTC
The issue has been solved by the workardound (cherry picked an upstream change).

I have upgraded Zuul (bug 46354) a few minutes ago and it includes the change. Nothing left to do so :)
Comment 8 Antoine "hashar" Musso (WMF) 2013-04-26 09:46:29 UTC
This happened again on Thursday April 25th. Example change https://gerrit.wikimedia.org/r/#/c/60765/

Took a good 20 minutes for gate-and-submit to report back.
Comment 9 Antoine "hashar" Musso (WMF) 2013-04-26 09:47:38 UTC
one possibility is that there were over jobs running at that time such as the parser tests which takes a good 5 minutes to run. So if you have lot of changes submitted (like X), the most recent change would be run (X+1) * 5 minutes after submission.
Comment 10 Antoine "hashar" Musso (WMF) 2013-05-06 19:20:39 UTC
Zuul has been upgraded which got some performances improvements.  We had some issues such as over querying Jenkins from time to time.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links