[root@02 ~]# dmraid -s *** Group superset .ddf1_disks --> Subset name : ddf1_4c5349202020202010000411100010044711471181ddbcc4 size : 123045888 stride : 128 type : stripe status : ok subsets: 0 devs : 1 spares : 0
Tried to deactivate the raid device
[root@02 ~]# dmraid -an ERROR: dos: partition address past end of RAID device The dynamic shared library "libdmraid-events-ddf1.so" could not be loaded: libdmraid-events-ddf1.so: cannot open shared object file: No such file or directory
Remove all raid devices metadata
[root@02 ~]# dmraid -r -E /dev/sdb Do you really want to erase "ddf1" ondisk metadata on /dev/sdb ? [y/n] :y ERROR: ddf1: seeking device "/dev/sdb" to 32779907366912 ERROR: writing metadata to /dev/sdb, offset 64023256576 sectors, size 0 bytes returned 0 ERROR: erasing ondisk metadata on /dev/sdb
Format the entire disk with zeros so that it does not get detected as raid device and cause further problem.
As the number of metrics in your environment grows you start to see huge impact in system IO performance. At one stage the disk utilization stays at 100% and cpu spends a lot of time waiting for IO. At this time the IO exceeds the theoratical IO supported by the disk. We might go ahead and add fast 15K disks or raid10 array. The disk contention stays as long as we keep on expanding the cluster and adding new metrics.
A simple solution would be to add the rrdcached layer in the middle. There are certain things to consider such as updating the rrdtool package and recompiling ganglia with the new rrdtool support.
Just a step by step guide of doing it.
A. Steps for building and install rrdtool and ganglia.
1. Since gmetad runs with ganglia user and rrdcached require access to write to rrd dir and apache needs access of same directory. Add ganglia to apache group.
9. Edit td-agent configuration in log aggregator server
<source>
type forward
port 24224
</source>
<match stats.access>
type webhdfs
host <NAMENODE OR HTTPFS HOST>
port 14000
path /user/hdfs/stats_logs/stats_access.%Y%m%d_%H.log
httpfs true
username httpfsuser
</match>
10. Start td-agent in log aggregator host
/etc/init.d/td-agent start
* ensure that there are no errors in /var/log/td-agent/td-agent.log
While starting namenode you might come across this error.
Class org.apache.hadoop.thriftfs.NamenodePlugin not found
In cdh4 it does not require a plug-in on the NameNode or DataNodes. Hence all the configuration related to that should be removed from namenode and datanode hdfs-site.xml
Extract data from oplog in MongoDB and restore in another MongoDB server.
Recently I came across a problem where we have to do a lot of modifications in the mongodb server which will be having issues with the production database. We removed the replication between the master and slave and then did the operation in slave and then updated the data using the tool wordnik-oss tools.
Unfortunately we did not have a replica set and we had normal Master-Slave setup. While all the updates are happening in slave I need to keep track of the master data so that I can add it to the slave. For this I used a tool named mongodb-admin-utils in wordnik-oss https://github.com/wordnik/wordnik-oss.
Required software:
1. Java and Git: yum install java-1.6.0-sun-devel java-1.6.0-sun git
2. Maven: recent version of wordnik-oss require maven 3
cd /usr/src wgethttp://apache.techartifact.com/mirror/maven/binaries/apache-maven-3.0.4-bin.tar.gz tar zxf apache-maven-3.0.4-bin.tar.gz
2. Compile and build In my case I only needed mongodb-admin-utils and hence I packaged only that.
cd wordnik/modules/mongo-admin-utils /usr/src/apache-maven-3.0.4/bin/mvn package
Once this is complete you can use mongo-admin-utils in the host.
Get Incremental oplog Backup from mongo master server
cd wordnik/modules/mongo-admin-utils ./bin/run.sh com.wordnik.system.mongodb.IncrementalBackupUtil -o /root/mongo -h mastermongodb
/root/mongo => output directory where the oplog is stored. mastermongodb => mongodb master host.
** We can't use this tool in slave as there is no oplog in slave.
Replay the Data from the oplog to the database
I had some problems in restoring data from backup and I had to add the following settings for the restore to work without any issues. ulimit -n 20000
Added the following Java options in run.sh so that it does not fail with Out Of Memory (OOM ) erros. JAVA_CONFIG_OPTIONS="-Xms5g -Xmx10g -XX:NewSize=2g -XX:MaxNewSize=2g -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:PermSize=2g -XX:MaxPermSize=2g"
Replay Command:
./bin/run.sh com.wordnik.system.mongodb.ReplayUtil -i /root/mongo -h localhost localhost => the mongodb server in which you want the data to be added.
If you face any issues you can go ahead and file a issue in https://github.com/wordnik/wordnik-os . The developer is a awesome person and will help you sort out the issue.
When you want to use custom hostname for puppet it shows the following error.
=============
err: Could not retrieve catalog from remote server: hostname was not match with the server certificate warning: Not using cache on failed catalog err: Could not retrieve catalog; skipping run err: Could not send report: hostname was not match with the server certificate ==============
In my case I wanted to use the default hostname "puppet" . Add the following entries to puppet master configuration file /etc/puppet/puppet.conf