Last modified: 2014-02-24 11:36:06 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T61930, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 59930 - git fetches timing out, also cause bogus jenkins failures
git fetches timing out, also cause bogus jenkins failures
Status: RESOLVED WORKSFORME
Product: Wikimedia
Classification: Unclassified
Continuous integration (Other open bugs)
unspecified
All All
: Unprioritized major (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-01-10 23:57 UTC by Bartosz Dziewoński
Modified: 2014-02-24 11:36 UTC (History)
8 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Bartosz Dziewoński 2014-01-10 23:57:56 UTC
I'm tired of this.

https://gerrit.wikimedia.org/r/106301
https://gerrit.wikimedia.org/r/106302
https://gerrit.wikimedia.org/r/106303
https://gerrit.wikimedia.org/r/106304 (twice)

And that's just today, this has been going on for a few days, with the same issue (test runs for 10 minutes, then fails).
Comment 1 Bartosz Dziewoński 2014-01-11 13:25:14 UTC
https://gerrit.wikimedia.org/r/#/c/103546/
Comment 2 Bartosz Dziewoński 2014-01-12 13:09:22 UTC
https://gerrit.wikimedia.org/r/103546 , again.

Can we please revert jenkins to when it worked?
Comment 3 Bartosz Dziewoński 2014-01-12 13:21:02 UTC
The root cause it probably the fact that gerrit itself is incredibly slow for the last few days and so the fetches keep timing out.
Comment 4 Antoine "hashar" Musso (WMF) 2014-01-12 13:54:48 UTC
(In reply to comment #2)
> https://gerrit.wikimedia.org/r/103546 , again.
> 
> Can we please revert jenkins to when it worked?

That is unrelated to the upgrade of Zuul I did last week which is "simply" add a middle war between Zuul and Jenkins: Gearman.

The timeout issue is not related to Gerrit since we do not use it. The changes are fetched using something like:

git fetch \
  refs/zuul/master/Z0e1e8799e33145bc911d2bd465d59179 \
  git://zuul.eqiad.wmnet/mediawiki/core \
  --reference=/srv/ssd/gerrit/mediawiki/core.git


The URL git://zuul.eqiad.wmnet/mediawiki/core points to gallium server which has the Zuul daemon. That is where the merge references are created.  They are publishing using git-daemon.

The reference /srv/ssd/gerrit/mediawiki/core.git is a replication of git repositories which is on the same disk as the jobs workspace (/srv/ssd).  Which mean when cloning, git will use hardlinks and save a ton of network I/O and disk space.


When the fetch occurs, the client side apparently send to the server (git://zuul.eqiad.wmnet/) a list of all objects then a diff is made server side and the missing elements are sent back to the client.

The timeouts might be caused when a new workspace is created which cause the full repository to be sent to the client. Might end up taking longer than 10 minutes :/
Comment 5 Antoine "hashar" Musso (WMF) 2014-01-12 14:04:01 UTC
Delete /srv/ssd/jenkins-slave/workspace/mediawiki-core-phpunit-misc@3 workspace on lanthanum.
Comment 6 Antoine "hashar" Musso (WMF) 2014-01-12 14:08:43 UTC
I looked at the failing jobs of all Gerrit changes mentioned above. All of them had failure of mediawiki-core-phpunit job when it was being run on lanthanum server and in workspace above.

I guess it ended up badly initialized somehow.  I am not sure what is the root cause though.
Comment 7 Antoine "hashar" Musso (WMF) 2014-02-24 11:36:06 UTC
I haven't noticed such issue for a few weeks now.  Assuming some workspace got corrupted and caused git to choke on it.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links