Last modified: 2013-06-10 13:11:24 UTC
After 30 min and still zuul shows "Queue lengths: 0 events, 0 results" where I hoped for [1], [2] to show some life signs I'll probably can assume that Jenkins died again (after yesterday's incident bug 49294). I'll leave the severity to be decided by someone else but this is getting a bit awkward now having two days in a row issues with Jenkins. [1] https://gerrit.wikimedia.org/r/#/c/61171/ [2] https://gerrit.wikimedia.org/r/#/c/60092/
Yesterday issue was related to Gerrit having a full queue. Most probably the same today.
Seems we just need to restart Gerrit. I have mailed the operations team about it. Until it is restarted no tests are going to be triggered. That is surely annoying to people submitting patches meanwhile, but I do not think it is worth paging the whole ops team overnight. After all, sites are still up :) Will follow up with Chad next weak to start having a proper monitoring for the Gerrit/Zuul/Jenkins processing chain. We will also want to fix the root cause in Gerrit.
Alexndros restarted Gerrit a few minutes after I sent the email to ops and confirmed the service is back up.
Well sorry, but tests again don't run and this time I can't verify and merge because of the missing state/gate process.
Restarted again...
And seems broken again...
For the record (noticed it wasn't recorded in Bugzilla yet) the following is what Zuul reprots in the log: 2013-06-09 04:49:18,781 ERROR gerrit.GerritWatcher: Exception on ssh event stream: Traceback (most recent call last): File "./zuul/lib/gerrit.py", line 68, in _run self._listen(stdout, stderr) File "./zuul/lib/gerrit.py", line 52, in _listen self._read(stdout) File "./zuul/lib/gerrit.py", line 39, in _read data = json.loads(l) File "/usr/lib/python2.7/json/__init__.py", line 326, in loads return _default_decoder.decode(s) File "/usr/lib/python2.7/json/decoder.py", line 366, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/usr/lib/python2.7/json/decoder.py", line 384, in raw_decode raise ValueError("No JSON object could be decoded") ValueError: No JSON object could be decoded
(In reply to comment #6) > And seems broken again... Restarted again. (In reply to comment #2) > Seems we just need to restart Gerrit. That's only a temporary solution (about 12 hours at most). Chad has said on the engineering list he'll work on this tomorrow.
*** Bug 49294 has been marked as a duplicate of this bug. ***
I assume that no Bugzilla notifications of new changesets are published is related to this as well?
(In reply to comment #10) > I assume that no Bugzilla notifications of new changesets are published is > related to this as well? That seems unrelated. I filed a separate bug for it: bug 49388
this ends up being a dupe of bug 46917 "Gerrit no more emit events when using `stream-events`" where the ssh connection between Zuul and Gerrit goes down because of a timeout and no events are ever send again for new connections. *** This bug has been marked as a duplicate of bug 46917 ***