Last modified: 2014-04-21 18:53:09 UTC
Although other worker nodes in the cluster (e.g.: analytics1016 [1]) show Hadoop metric groups like: * Hadoop.DataNode metrics * Hadoop.NodeManager metrics * Hadoop.NodeManager.jvm_memory metrics ganglia does not show them for analytics1012: http://ganglia.wikimedia.org/latest/?r=hour&cs=&ce=&c=Analytics+cluster +eqiad&h=analytics1012.eqiad.wmnet&tab=m&vn=&hide-hf=false&mc=2&z=medium&metric_group=NOGROUPS Should not those metrics also show up for analytics1012? [1] http://ganglia.wikimedia.org/latest/?r=hour&cs=&ce=&c=Analytics+cluster+eqiad&h=analytics1016.eqiad.wmnet&tab=m&vn=&hide-hf=false&mc=2&z=medium&metric_group=NOGROUPS
Bug 63470 might be related.
Prioritization and scheduling of this bug is tracked on Mingle card https://wikimedia.mingle.thoughtworks.com/projects/analytics/cards/cards/1523
YES! Found it. /etc/hosts had a bad IP listed on analytics1012 for itself. Fixed and things look much better now!
Fixed by Andrew. Ganglia now shows the metric groups and corresponding metrics.
Stupid me. Looked at the wrong host. Sorry. Reopening: analytics1012 still misses the metric groups of this bug's description.
I see them, closing, unless I am wrong. http://ganglia.wikimedia.org/latest/?r=hour&cs=&ce=&c=Analytics+cluster+eqiad&h=analytics1012.eqiad.wmnet&tab=m&vn=&hide-hf=false&mc=2&z=medium&metric_group=NOGROUPS