As the number of metrics in your environment grows you start to see huge impact in system IO performance. At one stage the disk utilization stays at 100% and cpu spends a lot of time waiting for IO. At this time the IO exceeds the theoratical IO supported by the disk. We might go ahead and add fast 15K disks or raid10 array. The disk contention stays as long as we keep on expanding the cluster and adding new metrics.
A simple solution would be to add the rrdcached layer in the middle. There are certain things to consider such as updating the rrdtool package and recompiling ganglia with the new rrdtool support.
Just a step by step guide of doing it.
A. Steps for building and install rrdtool and ganglia.
1. uninstall existing rrdtool
yum -y install rrdtool rrdtool-perl
2. Download latest rrdtool
wget http://apt.sw.be/redhat/el5/en/x86_64/dag/RPMS/rrdtool-1.4.7-1.el5.rf.x86_64.rpm
wget http://apt.sw.be/redhat/el5/en/x86_64/dag/RPMS/perl-rrdtool-1.4.7-1.el5.rf.x86_64.rpm
wget http://apt.sw.be/redhat/el5/en/x86_64/dag/RPMS/rrdtool-devel-1.4.7-1.el5.rf.x86_64.rpm
3. Install all three in a single go. Otherwise it will show some weird perl dependency errors.
rpm -ivh rrdtool-1.4.7-1.el5.rf.x86_64.rpm rrdtool-devel-1.4.7-1.el5.rf.x86_64.rpm perl-rrdtool-1.4.7-1.el5.rf.x86_64.rpm
4. Get the ganglia source rpm
wget http://downloads.sourceforge.net/project/ganglia/ganglia%20monitoring%20core/3.4.0/ganglia-3.4.0-1.src.rpm?r=http%3A%2F%2Fsourceforge.net%2Fprojects%2Fganglia%2Ffiles%2Fganglia%2520monitoring%2520core%2F3.4.0%2F&ts=1360839947&use_mirror=citylan
5. Install the dependencies required for building the rpm.
yum -y install libpng-devel libart_lgpl-devel python-devel libconfuse-devel pcre-devel freetype-devel
6. Build the rpm
rpm -ivh ganglia-3.4.0-1.src.rpm
rpmbuild -ba /usr/src/redhat/SPECS/ganglia.spec
7. Install the new ganglia versions
rpm -ivh /usr/src/redhat/RPMS/x86_64/ganglia-* /usr/src/redhat/RPMS/x86_64/libganglia-3.4.0-1.x86_64.rpm
B . Configuring rrdcached to work with ganglia.
1. Since gmetad runs with ganglia user and rrdcached require access to write to rrd dir and apache needs access of same directory. Add ganglia to apache group.
usermod -a -G apache ganglia
2. correct the permissions
chown -R ganglia:apache /var/lib/ganglia/rrds/
3. Update rrdcached sysconfig startup options.
cat /etc/sysconfig/rrdcached
OPTIONS="rrdcached -p /tmp/rrdcached.pid -s apache -m 664 -l unix:/tmp/rrdcached.sock -s apache -m 777 -P FLUSH,STATS,HELP -l unix:/tmp/rrdcached.limited.sock -b /var/lib/ganglia/rrds -B"
RRDC_USER=ganglia
cat /etc/sysconfig/gmetad
RRDCACHED_ADDRESS="unix:/tmp/rrdcached.sock"
grep rrdcached_socket /var/www/html/gweb/conf_default.php$conf['rrdcached_socket'] = "/tmp/rrdcached.sock";
service gmetad stop
service rrdcached start
service gmetad start