Last modified: 2013-06-10 13:11:24 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T51330, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 49330 - Zuul: Gerrit's ssh event stream unavailable
Zuul: Gerrit's ssh event stream unavailable
Status: RESOLVED DUPLICATE of bug 46917
Product: Wikimedia
Classification: Unclassified
Continuous integration (Other open bugs)
wmf-deployment
All All
: Low critical (vote)
: ---
Assigned To: Nobody - You can work on this!
:
: 49294 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-06-08 04:04 UTC by MWJames
Modified: 2013-06-10 13:11 UTC (History)
6 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description MWJames 2013-06-08 04:04:10 UTC
After 30 min and still zuul shows "Queue lengths: 0 events, 0 results" where I hoped for [1], [2] to show some life signs I'll probably can assume that Jenkins died again (after yesterday's incident bug 49294).

I'll leave the severity to be decided by someone else but this is getting a bit awkward now having two days in a row issues with Jenkins.

[1] https://gerrit.wikimedia.org/r/#/c/61171/

[2] https://gerrit.wikimedia.org/r/#/c/60092/
Comment 1 Antoine "hashar" Musso (WMF) 2013-06-08 07:39:51 UTC
Yesterday issue was related to Gerrit having a full queue. Most probably the same today.
Comment 2 Antoine "hashar" Musso (WMF) 2013-06-08 07:57:42 UTC
Seems we just need to restart Gerrit. I have mailed the operations team about it.

Until it is restarted no tests are going to be triggered. That is surely annoying to people submitting patches meanwhile, but I do not think it is worth paging the whole ops team overnight. After all, sites are still up :)

Will follow up with Chad next weak to start having a proper monitoring for the Gerrit/Zuul/Jenkins processing chain.  We will also want to fix the root cause in Gerrit.
Comment 3 Antoine "hashar" Musso (WMF) 2013-06-08 11:13:02 UTC
Alexndros restarted Gerrit a few minutes after I sent the email to ops and confirmed the service is back up.
Comment 4 MWJames 2013-06-09 01:41:35 UTC
Well sorry, but tests again don't run and this time I can't verify and merge because of the missing state/gate process.
Comment 5 Chad H. 2013-06-09 05:19:07 UTC
Restarted again...
Comment 6 Niklas Laxström 2013-06-09 18:07:42 UTC
And seems broken again...
Comment 7 Krinkle 2013-06-10 04:46:54 UTC
For the record (noticed it wasn't recorded in Bugzilla yet) the following is what Zuul reprots in the log:

2013-06-09 04:49:18,781 ERROR gerrit.GerritWatcher: Exception on ssh event stream:

Traceback (most recent call last):
  File "./zuul/lib/gerrit.py", line 68, in _run
    self._listen(stdout, stderr)
  File "./zuul/lib/gerrit.py", line 52, in _listen
    self._read(stdout)
  File "./zuul/lib/gerrit.py", line 39, in _read
    data = json.loads(l)
  File "/usr/lib/python2.7/json/__init__.py", line 326, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python2.7/json/decoder.py", line 366, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python2.7/json/decoder.py", line 384, in raw_decode
    raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded
Comment 8 Krinkle 2013-06-10 04:59:32 UTC
(In reply to comment #6)
> And seems broken again...

Restarted again.

(In reply to comment #2)
> Seems we just need to restart Gerrit.

That's only a temporary solution (about 12 hours at most). Chad has said on the engineering list he'll work on this tomorrow.
Comment 9 Krinkle 2013-06-10 05:05:22 UTC
*** Bug 49294 has been marked as a duplicate of this bug. ***
Comment 10 Tim Landscheidt 2013-06-10 09:58:02 UTC
I assume that no Bugzilla notifications of new changesets are published is related to this as well?
Comment 11 christian 2013-06-10 10:31:56 UTC
(In reply to comment #10)
> I assume that no Bugzilla notifications of new changesets are published is
> related to this as well?

That seems unrelated. I filed a separate bug for it: bug 49388
Comment 12 Antoine "hashar" Musso (WMF) 2013-06-10 13:11:24 UTC
this ends up being a dupe of bug 46917 "Gerrit no more emit events when using `stream-events`" where the ssh connection between Zuul and Gerrit goes down because of a timeout and no events are ever send again for new connections.

*** This bug has been marked as a duplicate of bug 46917 ***

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links