Last modified: 2014-08-27 18:29:31 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T72084, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 70084 - DNS can resolve to IPv6 address despite lack of IPv6 connectivity
DNS can resolve to IPv6 address despite lack of IPv6 connectivity
Status: RESOLVED WONTFIX
Product: Wikimedia Labs
Classification: Unclassified
Infrastructure (Other open bugs)
unspecified
All All
: Unprioritized normal
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-08-27 09:43 UTC by Antoine "hashar" Musso (WMF)
Modified: 2014-08-27 18:29 UTC (History)
5 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Antoine "hashar" Musso (WMF) 2014-08-27 09:43:26 UTC
On deployment-bastion.eqiad.wmflabs we have a job pulling from Gerrit. It errored out with:

  INFO:mwextpull:cwd: /srv/scap-stage-dir/php-master/extensions
  INFO:mwextpull:running: git pull
  error: Failed to connect to 2620:0:861:3:208:80:154:81: Network is unreachable while accessing https://gerrit.wikimedia.org/r/p/mediawiki/extensions.git/info/refs

Ref: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/21755/consoleFull

The eth0 has both IPv4 and IPv6 address, the later being a local one only though:

  inet addr:10.68.16.58  Bcast:10.68.23.255  Mask:255.255.248.0
  inet6 addr: fe80::f816:3eff:fe85:123f/64 Scope:Link

And:

$ host gerrit.wikimedia.org
gerrit.wikimedia.org has address 208.80.154.81
gerrit.wikimedia.org has IPv6 address 2620:0:861:3:208:80:154:81
$


The workaround would be to force ssh to IPv4 with GIT_SSH='ssh -4'.  But there is probably a better fix that needs to be done in the way instances resolve DNS entry, they should only use A entry or the DNS server only yield A entries when queried over IPv4.

A possibility is that ssh first try the IPv4 address, if it fails to connect, fallback to the next DNS entry which is the IPv6 address.  The error message would mean that the service did not respond properly on the IPv4 address.
Comment 1 Marc A. Pelletier 2014-08-27 14:34:21 UTC
That's actually behaviour mandated by the standards.  DNS servers must return both A and AAAA RRs regardless of which protocol you reached it with because it's entirely possible that you have only IPv4 connectivity to a DNS server yet only IPv6 to the host; and well-behaved code should always be using IPv6 to connect if it's available (although /robust/ code should be trying all addresses before giving up).

ssh actually does that by default though, so it's not obvious to me why that fails in your case.  Tests I just ran on some random labs instance reveal with strace that 'git pull' invokes ssh that does exactly what is expected:

connect(3, {sa_family=AF_INET6, sin6_port=htons(29418), inet_pton(AF_INET6, "2620:0:861:3:208:80:154:81", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = -1 ENETUNREACH (Network is unreachable)
connect(3, {sa_family=AF_INET, sin_port=htons(29418), sin_addr=inet_addr("208.80.154.81")}, 16) = 0

It tries IPv6 first, gets a network unreachable, then tries IPv4 and succeeds.

Could you try the following from a host where you get the error and paste the results:

strace -f git pull 2>&1 | grep connect
Comment 2 Antoine "hashar" Musso (WMF) 2014-08-27 15:55:32 UTC
Of all the build history I have, it only occurred twice for that job.  So must be some very weird/rare issue.

On deployment-bastion.eqiad.wmflabs , I created a clone of https://gerrit.wikimedia.org/r/p/integration/zuul.git



$ strace -f git pull 2>&1 | grep connect
connect(4, {sa_family=AF_FILE, path="/var/run/nscd/socket"}, 110) = 0
connect(4, {sa_family=AF_FILE, path="/var/run/nscd/socket"}, 110) = 0
connect(4, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.68.16.1")}, 16) = 0
connect(4, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.68.16.1")}, 16) = 0
connect(4, {sa_family=AF_INET6, sin6_port=htons(443), inet_pton(AF_INET6, "2620:0:861:3:208:80:154:81", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = -1 ENETUNREACH (Network is unreachable)
connect(4, {sa_family=AF_UNSPEC, sa_data="\0\0\0\0\0\0\0\0\0\0\0\0\0\0"}, 16) = 0
connect(4, {sa_family=AF_INET, sin_port=htons(443), sin_addr=inet_addr("208.80.154.81")}, 16) = 0
connect(4, {sa_family=AF_INET, sin_port=htons(443), sin_addr=inet_addr("208.80.154.81")}, 16) = -1 EINPROGRESS (Operation now in progress)

So v6 -> v4 appropriately.

Maybe if the v4 address is unreachable, git report the first failure message instead of the last one.
Comment 3 Marc A. Pelletier 2014-08-27 16:10:22 UTC
(In reply to Antoine "hashar" Musso from comment #2)
> Maybe if the v4 address is unreachable, git report the first failure message
> instead of the last one.

That's actually very likely, leading to a potentially very confusing error message; though I'd expect that'd only be the case if /both/ errors were ENETUNREACH (which might be possible if there was a routing issue while the job was running).

Was the last occurrence of that bug recent?
Comment 4 Antoine "hashar" Musso (WMF) 2014-08-27 18:08:17 UTC
> Was the last occurrence of that bug recent?

9 hours ago: August 27th 9:00 UTC.


I will have a look at git source code and maybe attempt to reproduce the issue :D
Comment 5 Antoine "hashar" Musso (WMF) 2014-08-27 18:29:31 UTC
That might be some issue in curl which git relies upon.  I am assuming the issue is with Gerrit being unreachable somehow and closing this bug.

Will reopen if that occurs more often.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links