Personal and Technical study guide: 2013

Tuesday, July 23, 2013

Push Thrift Metrics to Ganglia CDH4

Add the following entries in /etc/hbase/conf/hadoop-metrics.properties or $HBASE_HOME/conf/hadoop-metrics.properties file

===========

thriftserver.class=org.apache.hadoop.metrics.ganglia.GangliaContext31

thriftserver.period=10

thriftserver.servers=<ganglia_multicast_address>:<ganglia_port>

===========

Restart hbase-thrift-server and you should see thrift metrics in ganglia interface.

Tuesday, March 12, 2013

No such device or address while trying to determine filesystem size on SSD in Centos 5

[root@02 ~]# mkfs.ext4 /dev/sdb1
mke4fs 1.41.12 (17-May-2010)
mkfs.ext4: No such device or address while trying to determine filesystem size

All the usual stuff by playing around with df -h, mount , cat /proc/swaps, /etc/mtab , lsof did not yeild any results.

Checking out some forums suggested that it is related to low level raid stuff and can be found by help of dmsetup and dmraid.

List device name

[root@02 ~]# dmsetup ls
ddf1_4c5349202020202010000411100010044711471181ddbcc4   (253, 0)

Get some more information about the device

[root@02 ~]# dmsetup info ddf1_4c5349202020202010000411100010044711471181ddbcc4
Name:              ddf1_4c5349202020202010000411100010044711471181ddbcc4
State:             ACTIVE
Read Ahead:        256
Tables present:    LIVE
Open count:        0
Event number:      0
Major, minor:      253, 0
Number of targets: 1
UUID: DMRAID-ddf1_4c5349202020202010000411100010044711471181ddbcc4

Discover all software raid devices in the system

[root@02 ~]# dmraid -r
/dev/sdb: ddf1, ".ddf1_disks", GROUP, ok, 123045888 sectors, data@ 0

Get the property of the raid disks

[root@02 ~]# dmraid -s
*** Group superset .ddf1_disks
--> Subset
name   : ddf1_4c5349202020202010000411100010044711471181ddbcc4
size   : 123045888
stride : 128
type   : stripe
status : ok
subsets: 0
devs   : 1
spares : 0

Tried to deactivate the raid device

[root@02 ~]# dmraid -an
ERROR: dos: partition address past end of RAID device
The dynamic shared library "libdmraid-events-ddf1.so" could not be loaded:
libdmraid-events-ddf1.so: cannot open shared object file: No such file or directory

Remove all raid devices metadata

[root@02 ~]# dmraid -r -E /dev/sdb
Do you really want to erase "ddf1" ondisk metadata on /dev/sdb ? [y/n] :y
ERROR: ddf1: seeking device "/dev/sdb" to 32779907366912
ERROR: writing metadata to /dev/sdb, offset 64023256576 sectors, size 0 bytes returned 0
ERROR: erasing ondisk metadata on /dev/sdb

Format the entire disk with zeros so that it does not get detected as raid device and cause further problem.

dd if=/dev/zero of=/dev/sdb

Friday, February 15, 2013

Centos 5 with ganglia and rrdcached

As the number of metrics in your environment grows you start to see huge impact in system IO performance. At one stage the disk utilization stays at 100% and cpu spends a lot of time waiting for IO. At this time the IO exceeds the theoratical IO supported by the disk. We might go ahead and add fast 15K disks or raid10 array. The disk contention stays as long as we keep on expanding the cluster and adding new metrics.

A simple solution would be to add the rrdcached layer in the middle. There are certain things to consider such as updating the rrdtool package and recompiling ganglia with the new rrdtool support.

Just a step by step guide of doing it.

A. Steps for building and install rrdtool and ganglia.

1. uninstall existing rrdtool

yum -y install rrdtool rrdtool-perl

2. Download latest rrdtool

wget http://apt.sw.be/redhat/el5/en/x86_64/dag/RPMS/rrdtool-1.4.7-1.el5.rf.x86_64.rpm
wget http://apt.sw.be/redhat/el5/en/x86_64/dag/RPMS/perl-rrdtool-1.4.7-1.el5.rf.x86_64.rpm
wget http://apt.sw.be/redhat/el5/en/x86_64/dag/RPMS/rrdtool-devel-1.4.7-1.el5.rf.x86_64.rpm

3. Install all three in a single go. Otherwise it will show some weird perl dependency errors.

rpm -ivh rrdtool-1.4.7-1.el5.rf.x86_64.rpm rrdtool-devel-1.4.7-1.el5.rf.x86_64.rpm perl-rrdtool-1.4.7-1.el5.rf.x86_64.rpm

4. Get the ganglia source rpm

wget http://downloads.sourceforge.net/project/ganglia/ganglia%20monitoring%20core/3.4.0/ganglia-3.4.0-1.src.rpm?r=http%3A%2F%2Fsourceforge.net%2Fprojects%2Fganglia%2Ffiles%2Fganglia%2520monitoring%2520core%2F3.4.0%2F&ts=1360839947&use_mirror=citylan

5. Install the dependencies required for building the rpm.

yum -y install libpng-devel libart_lgpl-devel python-devel libconfuse-devel pcre-devel freetype-devel

6. Build the rpm

rpm -ivh ganglia-3.4.0-1.src.rpm
rpmbuild -ba /usr/src/redhat/SPECS/ganglia.spec

7. Install the new ganglia versions

rpm -ivh /usr/src/redhat/RPMS/x86_64/ganglia-* /usr/src/redhat/RPMS/x86_64/libganglia-3.4.0-1.x86_64.rpm

B . Configuring rrdcached to work with ganglia.

1. Since gmetad runs with ganglia user and rrdcached require access to write to rrd dir and apache needs access of same directory. Add ganglia to apache group.

usermod -a -G apache ganglia

2. correct the permissions

chown -R ganglia:apache /var/lib/ganglia/rrds/

3. Update rrdcached sysconfig startup options.

cat /etc/sysconfig/rrdcached

OPTIONS="rrdcached -p /tmp/rrdcached.pid -s apache -m 664 -l unix:/tmp/rrdcached.sock -s apache -m 777 -P FLUSH,STATS,HELP -l unix:/tmp/rrdcached.limited.sock -b /var/lib/ganglia/rrds -B"

RRDC_USER=ganglia

5. Update gmetad sysconfig file so that it knows the rrdcached socket information.

cat /etc/sysconfig/gmetad

RRDCACHED_ADDRESS="unix:/tmp/rrdcached.sock"

6. Update ganglia-web config information so that apache communicates with rrdcached daemon to fetch rrd information.

grep rrdcached_socket /var/www/html/gweb/conf_default.php
$conf['rrdcached_socket'] = "/tmp/rrdcached.sock";

7. stop gmetad

service gmetad stop

8. start rrdcached

service rrdcached start

9 . start gmetad

service gmetad start

If everything is fine then you should see graphs populating in ganglia frontend.

At the same time you'll see that the IO disk utilization is reduced awesomely :)