Last modified: 2014-09-02 10:34:17 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T65709, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 63709 - intermittent DNS resolve problems with wmflabs domains
intermittent DNS resolve problems with wmflabs domains
Status: RESOLVED FIXED
Product: Wikimedia Labs
Classification: Unclassified
Infrastructure (Other open bugs)
unspecified
All All
: Unprioritized critical
: ---
Assigned To: Nobody - You can work on this!
:
Depends on: 63717
Blocks:
  Show dependency treegraph
 
Reported: 2014-04-08 22:14 UTC by se4598
Modified: 2014-09-02 10:34 UTC (History)
7 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description se4598 2014-04-08 22:14:10 UTC
Reported today: various times, domains and users.
I also saw confusing outputs while querying/pinging the domains (on different machines). Also after some times single domains worked again.

domains include:
* bits.beta.wmflabs.org
* tools-login.wmflabs.org
* simple.wikipedia.beta.wmflabs.org
* icinga.wmflabs.org
* ganglia.wmflabs.org
* bots.wmflabs.org

Excerps from IRC:
* <se4598>I can't reach bits.beta. ping: unknown host bits.beta.wmflabs.org on 2 independent pc's.
* <scfc_de> se4598: Hard to say, but looks certainly odd; IIRC "unknown host" would imply an authorative answer from the WMF DNS server (as different from "couldn't resolve", which means the DNS server wasn't reachable).  Coren rebooted wikitech about 15:20Z, and I think the LDAP server that provides the DNS records is located there as well, so that could explain some of that, but I wouldn't assume that the negative answers are cached for nearly 
* <se4598> scfc_de: icinga.wmflabs.org resolves (win7) on nslookup, but not on ping. On remote unix also icinga.wmflabs.org has address 208.80.155.156 but ping: unknown host icinga.wmflabs.org
* <se4598> What does it mean when "host <.....>" doesn't give an output but simply returns? Happens to me on one maschine for "host icinga.wmflabs.org"
* <Withoutaname> se4598: actually I dont know if this is relevant but downforeveryoneorjustme.com is also reporting errors
* <se4598> Withoutaname: just pinged ganglia (a not already tested domain) first try: unknown host, 3 seconds later second try: it works. 

Read in today's IRC-log from #wikimedia-labs (can't link b/c bots. doesn't resolve for me at he moment.....)
Comment 1 Tim Landscheidt 2014-04-08 22:20:46 UTC
To elaborate: I'm wondering what happens (happened) when wikitech/virt* is unavailable:

- Will the DNS server query only one or fall back unto the other LDAP server?  (I think that there are two LDAP servers, may be wrong about that.)
- Have both LDAP servers the same data?
- What does the DNS server return if no LDAP server is available?
Comment 2 Mark Holmquist 2014-04-08 22:48:37 UTC
Can you try going to the domains directly over HTTPS and report what happens? You may need to explicitly mark the certificates as trusted.
Comment 3 Mark Holmquist 2014-04-08 22:49:55 UTC
Oh, never mind, I saw ghosts of a totally different bug and jumped to conclusions. IGNORE ME!
Comment 4 se4598 2014-04-08 23:03:49 UTC
relevant irc log is http://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-labs/20140408.txt
(hurray, I can reach bots again)
(our) posts related to this report are between 18:24:12 and 22:14:26 UTC / log-time
Comment 5 Andrew Bogott 2014-04-09 07:19:16 UTC
We restarted virt1000 last night (heartbleed) and that probably caused this outage due to the stupid order-of-startup bug with pdns vs. opendj.  I've opened bug 63717 about that.
Comment 6 Tim Landscheidt 2014-04-09 15:31:56 UTC
I assume this has been fixed by Andrew restarting pdns in the mean time; the underlying problem will be dealt with in bug #63717.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links