Last modified: 2014-02-24 11:36:06 UTC
I'm tired of this. https://gerrit.wikimedia.org/r/106301 https://gerrit.wikimedia.org/r/106302 https://gerrit.wikimedia.org/r/106303 https://gerrit.wikimedia.org/r/106304 (twice) And that's just today, this has been going on for a few days, with the same issue (test runs for 10 minutes, then fails).
https://gerrit.wikimedia.org/r/#/c/103546/
https://gerrit.wikimedia.org/r/103546 , again. Can we please revert jenkins to when it worked?
The root cause it probably the fact that gerrit itself is incredibly slow for the last few days and so the fetches keep timing out.
(In reply to comment #2) > https://gerrit.wikimedia.org/r/103546 , again. > > Can we please revert jenkins to when it worked? That is unrelated to the upgrade of Zuul I did last week which is "simply" add a middle war between Zuul and Jenkins: Gearman. The timeout issue is not related to Gerrit since we do not use it. The changes are fetched using something like: git fetch \ refs/zuul/master/Z0e1e8799e33145bc911d2bd465d59179 \ git://zuul.eqiad.wmnet/mediawiki/core \ --reference=/srv/ssd/gerrit/mediawiki/core.git The URL git://zuul.eqiad.wmnet/mediawiki/core points to gallium server which has the Zuul daemon. That is where the merge references are created. They are publishing using git-daemon. The reference /srv/ssd/gerrit/mediawiki/core.git is a replication of git repositories which is on the same disk as the jobs workspace (/srv/ssd). Which mean when cloning, git will use hardlinks and save a ton of network I/O and disk space. When the fetch occurs, the client side apparently send to the server (git://zuul.eqiad.wmnet/) a list of all objects then a diff is made server side and the missing elements are sent back to the client. The timeouts might be caused when a new workspace is created which cause the full repository to be sent to the client. Might end up taking longer than 10 minutes :/
Delete /srv/ssd/jenkins-slave/workspace/mediawiki-core-phpunit-misc@3 workspace on lanthanum.
I looked at the failing jobs of all Gerrit changes mentioned above. All of them had failure of mediawiki-core-phpunit job when it was being run on lanthanum server and in workspace above. I guess it ended up badly initialized somehow. I am not sure what is the root cause though.
I haven't noticed such issue for a few weeks now. Assuming some workspace got corrupted and caused git to choke on it.