Last modified: 2014-04-21 18:39:43 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T66088, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 64088 - Replication checks disabled in Icinga for most analytics slaves
Replication checks disabled in Icinga for most analytics slaves
Status: NEW
Product: Analytics
Classification: Unclassified
General/Unknown (Other open bugs)
unspecified
All All
: Normal normal
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-04-18 14:30 UTC by christian
Modified: 2014-04-21 18:39 UTC (History)
4 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description christian 2014-04-18 14:30:22 UTC
Icinga's replication checks are disabled for 6/7 analytics slaves.

Let's get them turned on again, so Icinga alerts our team about lags again.

Icinga shows the following relevant services disabled:

* s1-analytics-slave.eqiad.wmnet (db1047.eqiad.wmnet)
** MySQL Replication Heartbeat
** MySQL Slave Delay

* s2-analytics-slave.eqiad.wmnet (db69.pmtpa.wmnet)
** MySQL Replication Heartbeat
** MySQL Slave Delay
** MySQL Slave Running

* s3-analytics-slave.eqiad.wmnet (db71.pmtpa.wmnet)
** MySQL Replication Heartbeat
** MySQL Slave Delay

* s4-analytics-slave.eqiad.wmnet (db72.pmtpa.wmnet)
<none>

* s4-analytics-slave.eqiad.wmnet (db1017.eqiad.wmnet)
** MySQL Replication Heartbeat
** MySQL Slave Delay

* s6-analytics-slave.eqiad.wmnet (db74.pmtpa.wmnet)
** MySQL Replication Heartbeat
** MySQL Slave Delay

* s7-analytics-slave.eqiad.wmnet (db68.pmtpa.wmnet)
** MySQL Replication Heartbeat
** MySQL Slave Delay
Comment 1 christian 2014-04-18 14:30:59 UTC
It seems no one in our team knows why the alerts are disabled, so I
pinged springle about it.
Comment 2 Bingle 2014-04-18 14:35:29 UTC
Prioritization and scheduling of this bug is tracked on Mingle card https://wikimedia.mingle.thoughtworks.com/projects/analytics/cards/cards/1555
Comment 3 christian 2014-04-18 21:22:10 UTC
Discussion with springle showed that the Icinga alerts are turned off
on purpose as the go off too often (due to slow queries run by
analytics :-) ).

Since a separate machine for slow queries is already on the way,
springe suggested to wait for this machine, and once slow queries have
been migrated over, we turn on Icinga alerts for the other machines
again.

Until then I'll have an eye on the lag and send out alerts if it gets
too high.
Comment 4 Toby Negrin 2014-04-21 18:39:43 UTC
Thanks for catching this, Christian!

-Toby

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links