Last modified: 2014-03-13 13:19:52 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T64234, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 62234 - http proxy doesn't work on eqiad
http proxy doesn't work on eqiad
Status: RESOLVED FIXED
Product: Wikimedia Labs
Classification: Unclassified
Other (Other open bugs)
unspecified
All All
: Highest blocker
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks: 62042 62213
  Show dependency treegraph
 
Reported: 2014-03-04 22:05 UTC by Peter Bena
Modified: 2014-03-13 13:19 UTC (History)
10 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Peter Bena 2014-03-04 22:05:09 UTC
when I setup wmflabs.org dns for any eqiad instance and try to relay it through https://wikitech.wikimedia.org/wiki/Special:NovaAddress I am getting some bad gateway error

this is likely because new instances can't be resolved from old cluster
Comment 1 Peter Bena 2014-03-04 22:44:27 UTC
given that this blocks migration of all projects that require http, I push the priority to top
Comment 2 Peter Bena 2014-03-10 09:27:27 UTC
nobody seem to care, pushing even higher priority :P time is running out and people can't migrate instance because of this
Comment 3 Andre Klapper 2014-03-10 10:44:55 UTC
Petr: Did you talk to Yuvi before assigning it to him? Is Yuvi aware?
Adding blocker bug 62042.

> blocks migration of all projects that require http

How many projects are roughly affected? Priorities should reflect reality...
Comment 4 Tim Landscheidt 2014-03-10 11:02:30 UTC
As other projects seem to have successfully migrated, what is the instance's name and what doesn't work?

Also, [[wikitech:Special:NovaAddress]] neither relays http nor proxies anything else, but just assigns public IPs.  The web proxy is managed at [[wikitech:Special:NovaProxy]].

Reassigning to nobody until it's clear what's wrong.
Comment 5 Addshore 2014-03-10 13:58:10 UTC
This also seems to be happening for me sometimes, currently trying to move one of my instances and using a temporary domain wdjenkins2.wmflabs.org

Specifically http://wdjenkins2.wmflabs.org/ci returns a 502.

On my instance the web server is set up and watching the access.log not a single drop seems to get through.
Oddly occasionally it will start working and I will see requests etc. coming through and then it will stop again.
Comment 6 Tim Landscheidt 2014-03-10 19:18:22 UTC
Addshore: Just for the obvious questions:

- This is a web proxy, i. e. [[wikitech:Special:NovaProxy]]?
- Your security groups are open "enough"?  (I. e. not 10.4.0.0/21, but 10.0.0.0/8?)
- Can you access the webserver on your instance directly from other instances in Labs?
Comment 7 Peter Bena 2014-03-10 21:05:32 UTC
Just to make it clear YES I AM TALKING ABOUT WebProxy, not NovaAddress

And yes it's still broken, so don't ask people if it's really on their side or if it's really broken because it is really broken. so fix it plz.

wm-bot's public logs are down until the moment it is fixed because someone did disable creation of proxies for pmtpa which is only working cluster for proxies now. So no new proxies can be created.
Comment 8 Peter Bena 2014-03-10 21:08:18 UTC
Tim Landscheidt: why should people on NEW cluster open firewall to OLD cluster which is just going to be shut down? There is no point in doing that - it's the proxy servers which are borked and needs to be migrated to NEW cluster so that they see the servers on NEW cluster which is the cluster which matters now (OLD cluster is to be deleted, there is no point in adjusting NEW cluster to make it work with OLD cluster, you need to do that other way)
Comment 9 Peter Bena 2014-03-10 21:50:31 UTC
btw this doesn't work too
80	80	tcp	0.0.0.0/0		
443	443	tcp	0.0.0.0/0
Comment 10 Peter Bena 2014-03-10 22:02:16 UTC
Andre: regarding projects - all projects on labs which are using proxy, that are almost all projects with any web server
Comment 11 Andrew Bogott 2014-03-11 02:41:48 UTC
We have had an ongoing DNS problem in eqiad (well, and in pmtpa a little bit too.)  The dns cache for labs instances gets swamped and there are periodic, brief dns outages.

The behavior that I've seen for proxied instances is that things mostly work, much of the time, but periodically I see an nginx gateway error.  These errors seem to correspond to the dns outages.

1)  Does that explain this bug, or are we talking about something else here?

2)  Are things any better today?  I just cranked up the dnsmasq cache on labnet1001 in hopes of easing this problem.
Comment 12 Peter Bena 2014-03-11 08:55:54 UTC
I see only gateway errors. I don't know when these "windows when it works" are expected but I have never seen them, only errors.
Comment 13 Peter Bena 2014-03-11 08:56:10 UTC
Now gateway was replaced with 404 error
Comment 14 Andrew Bogott 2014-03-11 09:51:23 UTC
This ought to be fixed by https://gerrit.wikimedia.org/r/#/c/118060/
Comment 15 Peter Bena 2014-03-11 13:53:18 UTC
But it's not. I still see the error.
Comment 16 Tim Landscheidt 2014-03-11 14:53:03 UTC
Are you sure the 404s are not coming from your webserver?
Comment 17 Andrew Bogott 2014-03-11 15:20:47 UTC
http://bots.wmflabs.org/ works for me now, I get the initial 'It works!' apache page.  bots.wmflabs.org resolves to 208.80.155.156

I'm about to step onto a plane, but will check in with this bug when I arrive (which will take more than a day :(  )

-A
Comment 18 Addshore 2014-03-11 16:01:03 UTC
well http://wdjenkins.wmflabs.org/ci for me seesm to be working a hell of allot better than it was when I first commented on this bug and I have done nothing so I can only guess cracking up the dnsmasq cache did something!

Will report back if anything else unexpected happens!
Comment 19 Peter Bena 2014-03-12 08:32:36 UTC
I still do see:

404 Not Found
nginx/1.5.0

This is not from my server
Comment 20 Tim Landscheidt 2014-03-13 02:01:47 UTC
If I access http://bots.wmflabs.org/~wm-bot/logs as per http://permalink.gmane.org/gmane.science.linguistics.wikipedia.technical/75917, I get:

| Not Found

| The requested URL /~wm-bot/logs was not found on this server.

| Apache/2.2.22 (Ubuntu) Server at bots.wmflabs.org Port 80

This doesn't look like nginx.
Comment 21 Peter Bena 2014-03-13 08:00:13 UTC
When I open the same link I get:

404 Not Found
nginx/1.5.0

That doesn't look like apache to me
Comment 22 Peter Bena 2014-03-13 08:00:43 UTC
Maybe it works in midnight only, but now in morning (europe) it doesn't
Comment 23 Andrew Bogott 2014-03-13 13:04:50 UTC
It remains hard for me to debug this since I can't see the failure here.  Peter, can you please specify which exact URL is producing this failure?  Is it still just http://bots.wmflabs.org/ ?
Comment 24 Peter Bena 2014-03-13 13:19:52 UTC
Now it works... NOW. But I wouldn't be surprised if tommorow morning it stopped working again :P

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links