Last modified: 2014-05-30 10:10:20 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T54409, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 52409 - Expose deployment status in -operations channel topic
Expose deployment status in -operations channel topic
Status: NEW
Product: Wikimedia
Classification: Unclassified
Deployment systems (Other open bugs)
wmf-deployment
All All
: Lowest enhancement (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-08-01 17:35 UTC by Greg Grossmeier
Modified: 2014-05-30 10:10 UTC (History)
6 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Greg Grossmeier 2013-08-01 17:35:15 UTC
The problem:
Especially during lightning deploys, but other times as well, 2 or more teams are coordinating around each other to deploy code. This results in a lot of back and forth, eg:

<personA> hey personB, how's it going?
...some non-zero time passes...
<personB> personA: about ready to scap
...some much larger than zero time passess...
<personA> personB: ping
...some non-zero time passes...
<personB> personA: oh yeah, done now.

Granted, there are the logmsgbot messages about when certain commands are run/started (and in some limited cases, completed, eg localization), but those can be easily missed if puppet decides that a bunch of hosts are fresh or not fresh.

Proposed solution:
Include the current status of deployments in the /topic in #wikimedia-operations.

This could begin as simple as:
"DeployStatus: Open" meaning "no deploy script is running"
"DeployStatus: Scap'ing" meaning uh, a scap is running, yeah
"DeployStatus: Syncing" meaning that a sync-dir/file is running
"DeployStatus: Localization Cache updating" meaning the obvious


Ideally, we'd also have one of our bots take commands like:
"!deploy take" which changes the /topic to include "DeployStatus: <nick> has the deploy stick"
"!deploy done" which puts it back to "DeployStatus: Open"

Between the "take" and "done", as deploy scripts are run, it would update with, eg:
"DeployStatus: <nick> has the stick, Scap'ing"
Comment 1 MZMcBride 2014-02-11 03:44:11 UTC
(In reply to comment #0)
> Granted, there are the logmsgbot messages about when certain commands are
> run/started (and in some limited cases, completed, eg localization), but
> those can be easily missed if puppet decides that a bunch of hosts are fresh
> or not fresh.

You can know if someone is pushing code by looking at the server admin log or scrollback. If the logging needs to be augmented, we should discuss that.

The issue with Puppet freshness warnings is mostly irrelevant here. That's quite clearly a separate issue that should be resolved. (Though it's pretty hopeless now, I imagine... anyone sane ignored that bot long ago.)

I'm not sure there's a real bug here. Setting unconfirmed accordingly.
Comment 2 Krinkle 2014-05-27 22:53:59 UTC
The server admin log is also viewable directly. So current status of a script having been started can be seen there.

The main problem is scripts that take a while to run (e.g. anything more than 10 seconds) but only report when the action is completed. I think it'd worthwhile to consider changing our scap scripts to log at the start instead of afterwards.

Both in order to communicate earlier as well as to ensure the action is logged at all (e.g. in case of some critical failure or abortion).
Comment 3 Krinkle 2014-05-27 23:01:47 UTC
I think the /topic changes would get rather verbose and create even more channel noise. If we want, we could equip a bot with a PRIVMSG status protocol.

E.g. !deploy in #wikimedia-operations (or in PM to the bot directly) would respond (in the channel or in PM) with a dynamically inferred deployment status (based on it tracking !log calls it should be able to determine whether a scap/sync is running).

And for assignee we have the deployment calendar/schedule I suppose. But I see how it can be useful to have this information more readily available.

Rejecting the channel noise of puppet for a moment, I think one way to go about this is simply by social convention. E.g. !deploy doesn't have to do anything for it to have meaning. It'd just be like stating the obvious or telling greg-g, except you're doing it in a more scannable way.

e.g.

<FooBar> !deploy take
..
<logmsgbot> !log fbar synchronized wmf-config/InitialiseSettings.php
..
<logmsgbot> !log fbar Started scap: OH YEAH
..
<logmsgbot> !log fbar Finished scap: OH YEAH
..
<MrQuux> meh..?
<FooBar> meh
..
<MrQuux> !deploy take
..
<logmsgbot> !log quux synchronized php-1.24wmf6/includes/HistoryBlob.php 'Backport fix for X'
..
..
..
<MrQuux> !deploy done
Comment 4 Greg Grossmeier 2014-05-27 23:13:44 UTC
(In reply to Krinkle from comment #3)
> I think the /topic changes would get rather verbose and create even more
> channel noise. If we want, we could equip a bot with a PRIVMSG status
> protocol.
> <snip>

You basically described https://github.com/etsy/PushBot, which I tried with a few people in a dummy channel. It's pretty limited and has no ACLs (ie: anyone could do "!deploy whatever" and would clutter up the /topic or message output. So I scrapped the idea for now.

Proposal:
Having /topic change once on git-deploy start/scap and once on git-deploy finish/scap competing would probably not be overkill, except maybe during SWAT deploys.

I think it's hard in our case, it makes sense in a private deployer only channel (ie: Etsy's setup) but probably not ours, at least for now.
Comment 5 Antoine "hashar" Musso (WMF) 2014-05-30 10:10:20 UTC
From Greg initial comment:

> Especially during lightning deploys, but other times as well, 2 or more teams
> are coordinating around each other to deploy code. 

And follow up proposing a soft locking mechanism.   Updating the channel topic is a possible implementation but I don't think it the best one for some reasons:

- it add another line of spam in the channel (ex: soandso changed topic)
- I am pretty sure most people don't look at the topic but at SAL instead
- it is not really accessible from the deployment server


IIRC Bryan has been working toward migrating the scap utility to use git-deploy which AFAIK has a locking mechanism.   We can probably add some soft/hard locking in scap if that is deemed urgent but given we had no locking ever, I think we can wait for git-deploy.


Greg, maybe we can talk about it during the QA/Release checkin or on one of the lists?

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links