Nginx => hadoop HDFS using Fluentd
Fluentd is a json everywhere log collector. It transmits logs as json streams so that log processing can be easily managed and processed.
Hadoop HDFS is a distributed filesystem which can be used to store any amount of logs and run mapreduce jobs for faster log processing.
We will be using fluent-webhdfs-plugin to send logs over to httpfs interface
1. Install hadoop-httpfs package
yum install hadoop-httpfs
2. Enable access to HDFS for httpfs user
vi /etc/hadoop/conf/core-site.xml
<property>
<name>hadoop.proxyuser.httpfs.hosts</name>
<value>localhost,httpfshost></value>
</property>
<property>
<name>hadoop.proxyuser.httpfs.groups</name>
<value>*</value>
</property>
3. Restart the hadoop cluster.
4. Start the hadoop-httpfs service
/etc/init.d/hadoop-httpfs start
5. Check whether it is working
curl -i "http://<namenode>:14000?user.name=httpfs&op=homedir"
HTTP/1.1 200 OK
Server: Apache-Coyote/1.1
6. Install treasure date td-agent in nginx servers and log-aggregator server
cat > /etc/yum.repos.d/treasuredate.repo
[treasuredata]
name=TreasureData
baseurl=http://packages.treasure-data.com/redhat/$basearch
gpgcheck=0
yum install td-agent
7. Install fluentd and fluentd-plugin-webhdfs in log-aggregator host
gem install fluent-logger --no-ri --no-rdoc
/usr/lib64/fluent/ruby/bin/fluent-gem install fluent-plugin-webhdfs
8. Edit td-agent configuration in nginx server
vi /etc/td-agent/td-agent.conf
# Tail the nginx logs associated with stats.slideshare.net
<source>
type tail
path /var/log/nginx/stats_access.log
format apache
tag stats.access
pos_file /var/log/td-agent/stats_access.pos
</source>
<match stats.access>
type forward
<server>
host <LOG AGGREGATOR NODE>
port 24224
</server>
retry_limit 5
<secondary>
type file
path /var/log/td-agent/stats_access.log
</secondary>
</match>
Edit Nginx configuration to use apache log format.
log_format main '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent"';
9. Edit td-agent configuration in log aggregator server
<source>
type forward
port 24224
</source>
<match stats.access>
type webhdfs
host <NAMENODE OR HTTPFS HOST>
port 14000
path /user/hdfs/stats_logs/stats_access.%Y%m%d_%H.log
httpfs true
username httpfsuser
</match>
10. Start td-agent in log aggregator host
/etc/init.d/td-agent start
* ensure that there are no errors in /var/log/td-agent/td-agent.log
11. Start td-agent in nginx servers
/etc/init.d/td-agent start
/etc/init.d/nginx restart
12. Check whether you can see the logs in HDFS
sudo -u hdfsuser hdfs dfs -ls /user/hdfs/stats_logs/
Found 1 items
-rw-r--r-- 3 httpfsuser group 17441 2012-10-12 01:10 /user/hdfsuser/stats_logs/stats_access.20121012_07.log
That is all.. Now you have a log aggregation happening
Great efforts put it to find the list of articles which is very useful to know, Definitely will share the same to other forums.
ReplyDeletekajal hot
These days, the fever for new innovative areas are expanding. Everyone is by all accounts stricken by AI, Internet of Things and others in comparative lines. One such insanely prominent mechanical area nowadays is information science. ExcelR Data Science Courses
ReplyDeleteGreat Article. Thank you for sharing! Really an awesome post for every one.
ReplyDeleteCompact Modeling of Perpendicular Magnetic Anisotropy Double Barrier Magnetic Tunnel Junction With Enhanced Thermal Stability Recording Structure Project For CSE
Learning Based Sphere Nonlinear Interpolation for Motion Synthesis Project For CSE
LSTM and Edge Computing for Big Data Feature Recognition of Industrial Electrical Equipment Project For CSE
A Compiler for Agnostic Programming and Deployment of Big Data Analytics on Multiple Platforms Project For CSE
An Edge Intelligence Empowered Recommender System Enabling Cultural Heritage Applications Project For CSE