Last modified: 2014-08-14 20:48:15 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T64667, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 62667 - [scap] Deploy events aren't showing up in graphite/gdash
[scap] Deploy events aren't showing up in graphite/gdash
Status: NEW
Product: Wikimedia
Classification: Unclassified
Deployment systems (Other open bugs)
wmf-deployment
All All
: High normal (vote)
: ---
Assigned To: Nobody - You can work on this!
: ops
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-03-14 22:01 UTC by Greg Grossmeier
Modified: 2014-08-14 20:48 UTC (History)
6 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Greg Grossmeier 2014-03-14 22:01:57 UTC
They used to show up in gdash when you ticked the "Show Code Deploys" checkbox.

adding "&target=drawAsInfinite(deploy.any)" to the graphite urls doesn't work :/
Comment 1 Bryan Davis 2014-03-14 22:04:30 UTC
See also https://rt.wikimedia.org/Ticket/Display.html?id=6970
Comment 2 Gerrit Notification Bot 2014-03-18 20:00:17 UTC
Change 119339 had a related patch set uploaded by BryanDavis:
Fix MW_STATSD_PORT to point to correct listener

https://gerrit.wikimedia.org/r/119339
Comment 3 Gerrit Notification Bot 2014-03-18 20:10:13 UTC
Change 119340 had a related patch set uploaded by BryanDavis:
Fix statsd_port value

https://gerrit.wikimedia.org/r/119340
Comment 4 Bryan Davis 2014-03-18 20:13:26 UTC
After these patches land and we get some data in graphite again I think we'll need to look at the gdash configuration and update the metric names that it uses to identify deployments as well. deploy2graphite and scap send different metrics to graphite.
Comment 5 Gerrit Notification Bot 2014-03-18 20:20:07 UTC
Change 119340 merged by jenkins-bot:
Fix statsd_port value

https://gerrit.wikimedia.org/r/119340
Comment 6 Gerrit Notification Bot 2014-03-18 20:20:16 UTC
Change 119339 merged by Ori.livneh:
Fix MW_STATSD_PORT to point to correct listener

https://gerrit.wikimedia.org/r/119339
Comment 7 Bryan Davis 2014-03-19 01:53:27 UTC
See https://gerrit.wikimedia.org/r/#/c/111409/ for the change from carbon to statsd that should have been accompanied by a change to the gdash configuration and port number as well.
Comment 8 Bryan Davis 2014-03-19 03:39:46 UTC
When we figure out what all the new deploy metrics are they should be added to templates/gdash/deploy_addon.erb in oeprations/puppet.git to fix the marks added.
Comment 9 Bryan Davis 2014-03-19 03:45:38 UTC
There is some additional problem with the current gdash configuration.

When the "Show Code Deploys" checkbox is active, something is causing the generated graphite URLs to contain an extraordinary number of superfluous ampersands. In one URL I just examined there are 4188 extra ampersands inserted between the deployment metric stanzas and the remainder of the graph description.

When these ampersands are removed from the graphite URL the graph renders (albeit with no deploy markers).
Comment 10 Greg Grossmeier 2014-03-19 04:07:59 UTC
For what it's worth, I was seeing graphite urls like that (tons of &s) on Friday the 14th.
Comment 11 Bryan Davis 2014-04-01 00:17:16 UTC
The configuration changes now have data being recorded in graphite for scap runs again, but there are three remaining issues:

1) The metric names have changed. The gdash configuration is looking to add the metrics "deploy.sync-common-file", "deploy.sync-common-all" and "deploy.scap" to the graph. With the change from direct carbon communication to statsd and the changes to scap code, these metric names have changed. "scap.scap.count" should be the equivalent of the old "deploy.scap" metric.

2) In theory the metrics for "deploy.sync-common-file" and "deploy.sync-common-all" should just need a ".count" added to them, but I'm not currently seeing metrics with those names in graphite at all.

3) The txstatsd recorded stats for "scap.scap.count" don't look right at all. I would expect graphite to be recording the aggregate sum of the "scap.scap:1|c" calls seen in the last minute which would typically be 0 and occasionally be 1 (or possibly 2 with aborted scaps). Instead it seems to be recording a value of 1.0 every minute with occasional values of 5.0 that are not correlated with other scap logging output. [0]


[0]: https://graphite.wikimedia.org/render?from=23%3A00_20140331&until=00%3A00_20140401&target=scap.scap.count&format=json
Comment 12 Bryan Davis 2014-04-09 19:51:37 UTC
Assigning to Ori in the hope that he can find some time to look into the txstatsd behavior and the missing metrics. Once those issues are fixed it should be pretty easy to correct the gdash configuration.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links