Last modified: 2014-09-16 08:07:22 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T72868, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 70868 - deployment-salt can't talk to itself, git deploy hangs
deployment-salt can't talk to itself, git deploy hangs
Status: RESOLVED FIXED
Product: Wikimedia Labs
Classification: Unclassified
deployment-prep (beta) (Other open bugs)
unspecified
All All
: Unprioritized normal
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-09-15 21:46 UTC by C. Scott Ananian
Modified: 2014-09-16 08:07 UTC (History)
9 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description C. Scott Ananian 2014-09-15 21:46:12 UTC
cscott-free: git deploy sync is hanging (no console output at all) on beta.
bd808: lame. I wonder if salt is sad in beta
cscott-free: how can i tell?
bd808: "[WARNING ] Master hostname: salt not found. Retrying in 30 second"
bd808: cscott: I ran `sudo salt-call saltutil.sync_all` on deployment-bastion and got that error
bd808: Looks like salt is borked in beta
bd808: "This master address: 'salt' was previously resolvable but now fails to resolve! The previously resolved ip addr will continue to be used"
cscott-free: might be a nice entry for https://wikitech.wikimedia.org/wiki/Trebuchet once we figure out the problem
matanya: bd808: it is actully : Error: /Stage[main]/Role::Salt::Minions/Salt::Grain[instanceproject]/Exec[ensure_instanceproject_deployment-prep]/unless: Check "/usr/local/s
matanya: bin/grain-ensure contains instanceproject deployment-prep" exceeded timeout
bd808: I get "Master hostname: salt not found. Retrying in 30 seconds" error on the salt master too :(
matanya: bd808: the root casue is: Warning: Unable to fetch my node definition, but the agent run will continue:
matanya: Warning: Connection refused - connect(2)
matanya: the host can't connect to puppet master
bd808: matanya: agreed
matanya: due to ferm rule changes i suspect
***bd808 shakes fist at ferm again
bd808: matanya: `iptables -L` is empty on deployment-salt and it can't talk to itself either
matanya: i rest my case :)
Comment 1 Bryan Davis 2014-09-15 21:48:47 UTC
On deployment-salt:

  $ salt '*' cmd.run hostname
  i-00000396.eqiad.wmflabs:
      deployment-pdf01
  i-00000504.eqiad.wmflabs:
      deployment-mathoid
  i-00000388.eqiad.wmflabs:
      deployment-stream

So only those 3 hosts are connected to the salt master.
Comment 2 Bryan Davis 2014-09-15 21:50:18 UTC
I tried restarting the salt-master process on deployment-salt and the salt-minion process on deployment-bastion and this didn't seem to help anything.
Comment 3 jeremyb 2014-09-16 05:48:17 UTC
2 prerequisites before booting salt-minion:

* kill all the existing salt-minion/grain-ensure/salt-call/etc. procs
* make sure /etc/salt/minion has been fixed after a31fd6929216e15a40a20a4b5716f9c75932bc62 (by puppet or by hand)

all are now salty except:
* deployment-saio
* deployment-parsoidcache02
* deployment-soa-cache01
Comment 4 Antoine "hashar" Musso (WMF) 2014-09-16 08:07:22 UTC
all are now salty except:
* deployment-saio
* deployment-parsoidcache02
* deployment-soa-cache01

Those instances haven't been migrated to the beta cluster puppet/salt masters.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links