Last modified: 2013-06-10 05:05:22 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T51294, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 49294 - Automatic test runs and merging broken
Automatic test runs and merging broken
Status: RESOLVED DUPLICATE of bug 49330
Product: Wikimedia
Classification: Unclassified
Continuous integration (Other open bugs)
unspecified
All All
: Highest critical (vote)
: ---
Assigned To: Chad H.
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-06-07 08:41 UTC by Niklas Laxström
Modified: 2013-06-10 05:05 UTC (History)
6 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Niklas Laxström 2013-06-07 08:41:12 UTC
Looks like some part of CI/Jenkins/Zuul is stuck again. I've been trying to merge ULS patches since yesterday afternoon, I got one merged this morning and then it was stuck again.
Comment 1 Andre Klapper 2013-06-07 09:38:02 UTC
Tentatively assigning to hashar - feel free to reassign.
Comment 2 Andre Klapper 2013-06-07 09:47:50 UTC
Niklas reminded me that Hashar is probably on vacation...
 
Greg, any idea who else to assign this to?
Comment 3 Antoine "hashar" Musso (WMF) 2013-06-07 13:06:53 UTC
Looks like the Jenkins slave on deployment-bastion is not reachable. I will get a look at the Jenkins issue as soon as I reach my laptop, hopefully in roughly an hour.

Not much I can do right now, I have no clue about what is wrong.
Comment 4 Antoine "hashar" Musso (WMF) 2013-06-07 13:37:11 UTC
The deployment-bastion slave had less than 1GB of disk space left. Jenkins thus unspoiled it and all jobs that were supposed to run on it were waiting for it to come back. I have lowered the dusk threshold to 300mb, that has bring the slave back up and dequeued pending jobs.

I retriggered some gerrit changes by either rebasing them or commenting 'recheck' but the events do not reach Zuul on gallium.

I still have no ssh access, if someone see this please ask ops or demon to restart the zuul service on gallium, then rebase (or use 'recheck') a change to see if it trigger a job.

A possibility is that gerrit no more sends events to zuul. That can be confirmed on gallium by tailling /var/log/zuul/debug.log on gallium. Whenever a comment is the added I' Gerrit the file should show a bunch of JSON. If nothing is received on zuul side, we might want to restart Gerrit as well.
Comment 5 Chad H. 2013-06-07 13:44:30 UTC
(In reply to comment #4)
> I still have no ssh access, if someone see this please ask ops or demon to
> restart the zuul service on gallium, then rebase (or use 'recheck') a change
> to
> see if it trigger a job.
> 

Done.

> A possibility is that gerrit no more sends events to zuul. That can be
> confirmed on gallium by tailling /var/log/zuul/debug.log on gallium.
> Whenever a
> comment is the added I' Gerrit the file should show a bunch of JSON. If
> nothing
> is received on zuul side, we might want to restart Gerrit as well.

Gerrit should be fine, haven't done any restarting or upgrading today.
Comment 6 Antoine "hashar" Musso (WMF) 2013-06-07 14:13:06 UTC
Chad told me on IRC that the Gerrit replication was failing, that it turn filled the events queue and no more events were being sent over gerrit stream-events which is used by Zuul.

Thanks Chad!!
Comment 7 Antoine "hashar" Musso (WMF) 2013-06-08 07:58:05 UTC
Happened again, see bug 49330.
Comment 8 Krinkle 2013-06-10 05:05:22 UTC
Marking as dupe since it wasn't fixed. It is a bug between Gerrit-Zuul that just pops up a again ~ 12 hours after Gerrit is restarted. it was not fixed.

*** This bug has been marked as a duplicate of bug 49330 ***

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links