Last modified: 2014-04-01 07:50:11 UTC
Whenever I deploy integration/slave-scripts from tin.eqiad.wmnet, the minions are stuck in pending state. ssh tin.eqiad.wmnet $ cd /srv/deployment/integration/slave-scripts $ git deploy start $ git pull $ git deploy sync ... # INFO : created tag 'integration/slave-scripts-20140205-131314' ... Running: sudo salt-call -l quiet publish.runner deploy.fetch 'integration/slave-scripts' Repo: integration/slave-scripts; checking tag: integration/slave-scripts-20140205-131314 2 minions pending (2 reporting) Continue? ([d]etailed/[C]oncise report,[y]es,[n]o,[r]etry): Details show: Continue? ([d]etailed/[C]oncise report,[y]es,[n]o,[r]etry): d Repo: integration/slave-scripts; checking tag: integration/slave-scripts-20140205-131314 lanthanum.eqiad.wmnet: integration/slave-scripts-20131210-164114 (fetch: 1 [started: 3 mins, last-return: 2 mins]) lanthanum.eqiad.wmnet: integration/slave-scripts-20131210-164114 (fetch: 1 [started: 3 mins, last-return: 3 mins]) 2 minions pending (2 reporting) Looking at the minion lanthanum.eqiad.wmnet in /srv/deployment/integration/slave-scripts , the tag has been fetched: lanthanum$ git tag |fgrep integration/slave-scripts-20131210-164114 integration/slave-scripts-20131210-164114 lanthanum$ I then continue: Continue? ([d]etailed/[C]oncise report,[y]es,[n]o,[r]etry): y Running: sudo salt-call -l quiet publish.runner deploy.checkout 'integration/slave-scripts,False' Repo: integration/slave-scripts; checking tag: integration/slave-scripts-20140205-131314 2 minions pending (2 reporting) Continue? ([d]etailed/[C]oncise report,[y]es,[n]o,[r]etry): d Repo: integration/slave-scripts; checking tag: integration/slave-scripts-20140205-131314 lanthanum.eqiad.wmnet: integration/slave-scripts-20131210-164114 (fetch: 1 [started: 0 mins, last-return: 0 mins]) lanthanum.eqiad.wmnet: integration/slave-scripts-20131210-164114 (fetch: 1 [started: 0 mins, last-return: 0 mins]) 2 minions pending (2 reporting) Continue? ([d]etailed/[C]oncise report,[y]es,[n]o,[r]etry): Looking on the minion, the tag has been checked out properly. At that point I just validate (y). I guess salt is broken somehow with lanthanum.eqiad.wmnet and not reporting back properly.
Is this still an issue? Was this on the initial deployment or on a subsequent deployment?
The issue has been around since december apparently (not the tag 2013-12-10). I must have done something wrong at that time which cause the minions to be out of sync :-/ I am not sure how to fix it, but it still happens for sure. I end up looking at each minions during sync to make sure the tag is fetched/checked out properly.
Fixed by Ryan in I8b1fe93.
Was duplicated as bug 63029. It is fixed indeed.