Last modified: 2014-10-16 12:48:59 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T65362, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 63362 - WMFLabs: Ganglia down / needs reinstall
WMFLabs: Ganglia down / needs reinstall
Status: RESOLVED WONTFIX
Product: Wikimedia Labs
Classification: Unclassified
Infrastructure (Other open bugs)
unspecified
All All
: High normal
: ---
Assigned To: Yuvi Panda
:
Depends on:
Blocks: 54710
  Show dependency treegraph
 
Reported: 2014-04-01 11:50 UTC by se4598
Modified: 2014-10-16 12:48 UTC (History)
8 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description se4598 2014-04-01 11:50:11 UTC
the ganglia installation at http://ganglia.wmflabs.org/ seems to be missing
I see there only a single http://ganglia.wmflabs.org/latest/conf.php nothing more
Comment 1 Krinkle 2014-04-01 20:01:24 UTC
Indeed. It broke down after migration to eqiad. I think andrewbogott did some work on it to restore it, but it went from a non-responding server to a server with an empty directory listing with conf.php. Not sure what's going on.
Comment 2 Antoine "hashar" Musso (WMF) 2014-04-01 20:14:51 UTC
I am volunteering for this.  Bryan "bd808" Davis seems interested and Andrew Bogott already proposed to review changes.

I have did some config tweaks with: https://gerrit.wikimedia.org/r/#/c/122790/

Figured out how to install the PHP material (we need to git clone to /usr/share/ganglia-webfrontend because the Debian package we ship does not have any PHP material).
Comment 3 Gerrit Notification Bot 2014-04-01 20:16:08 UTC
Change 123040 had a related patch set uploaded by Hashar:
ganglia: fix some missing paths for labs

https://gerrit.wikimedia.org/r/123040
Comment 4 Gerrit Notification Bot 2014-04-01 21:14:47 UTC
Change 123040 merged by Andrew Bogott:
ganglia: fix some missing paths for labs

https://gerrit.wikimedia.org/r/123040
Comment 5 Gerrit Notification Bot 2014-04-01 21:24:35 UTC
Change 123044 had a related patch set uploaded by Hashar:
ganglia: graphdir must be an absolute path

https://gerrit.wikimedia.org/r/123044
Comment 6 Gerrit Notification Bot 2014-04-01 21:27:15 UTC
Change 123044 merged by Andrew Bogott:
ganglia: graphdir must be an absolute path

https://gerrit.wikimedia.org/r/123044
Comment 7 Andre Klapper 2014-04-01 21:42:23 UTC
http://ganglia.wmflabs.org/latest/ seems to be back up?
Anything left to do here?
Comment 8 Antoine "hashar" Musso (WMF) 2014-04-02 20:52:26 UTC
We need a strong instance in labs.  Ganglia is a bit cpu heavy.  Will attempt to get the instance resized else will rebuild it from scratch again =)
Comment 9 Antoine "hashar" Musso (WMF) 2014-04-11 12:33:31 UTC
Will poke at it again next week with Andrew Boggot. I would like us to attempt to resize the instance to a bigger profile (1 cpu -> 4 cpu).  Else we will spawn a new instance and update the configuration to point all gmond to the new IP.
Comment 10 Krinkle 2014-06-12 00:46:32 UTC
Seems to be down still (or again).

> There was an error collecting ganglia data (127.0.0.1:8654): fsockopen error: Connection refused
Comment 11 Greg Grossmeier 2014-08-28 21:10:09 UTC
(In reply to Antoine "hashar" Musso from comment #9)
> Will poke at it again next week with Andrew Boggot. I would like us to
> attempt to resize the instance to a bigger profile (1 cpu -> 4 cpu).  Else
> we will spawn a new instance and update the configuration to point all gmond
> to the new IP.

(That was from April 11th, 2014)

Andrew: Can we get some help fixing this Ganglia issue on wmflabs? Maybe Coren or Yuvi?
Comment 12 Yuvi Panda 2014-08-28 21:14:05 UTC
Ganglia is dead, long live Graphite.

We had a working graphite.wmflabs.org instance for a while, but the same problems that ganglia ran into we ran into with graphite.

So we have provisioned a 'real' machine (labmon1001) that will collect the stats. Graphite and txstatsd are now provisioned on that machine, and I'm awaiting some network config to be completed (RT #8163) before I can turn on stats collection. 

After that I'll have to write some way of autogenerating a nomninal set of graphs in a graphite like way by default for all machines (See http://tools.wmflabs.org/giraffe/index.html#dashboard=ToolLabs+Basics&timeFrame=1h for a prototype for toollabs only).
Comment 13 Antoine "hashar" Musso (WMF) 2014-08-28 21:35:19 UTC
For history purposes:

Ganglia on labs dies because it is on a small instance.  Andrew attempted a resize via nova but that definitely does not work.

Since:
- ganglia is not fully puppetized
- changing the IP is not straight forward (update all manifest, make sure puppet run on all instance)

I happily gave up to focus ™ on over things.


Yuvi essentially took over as he explained and that includes dishing Ganglia with some real hardware and diamond -> graphite.

Yuvi: +1 on Giraffe :-)
Comment 14 Yuvi Panda 2014-10-16 12:48:59 UTC
We have graphite.wmflabs.org and https://tools.wmflabs.org/nagf now :)

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links