How To Streaming Log File To HDFS Using Flume In Big Data Application

You will learn from big data analytics services suppliers regarding streaming log file technique to HDFS with Flume. This post will introduce Flume and everything else needed for streaming log files using Flume in big data applications.

Introduction Flume to ingest the data to HDFS

In big data applications, raw data is very important to do more analytic operations. In this blog, I will introduce Apache Flume which helps to ingest the data from many sources to our HDFS to process the data.
Flume is a subproject of the Hadoop ecosystem which ingests the log data from outside systems to Hadoop. In ingesting the data, Flume will run 1 or many agents, and agents have three mandatory components below:
§ Sources receive data and send it to channels.
§ Channels keep the data in the queue to wait for communication between sources and sinks.
§ Sinks process data collected from queues from channels and move it to HDFS.

Environment

Java: JDK 1.7

Cloudera version: CDH4.6.0

Initial steps

We need to make sure we have some log files in our Linux system.
Create the configuration config for the Flume agent as the configuration below.

Code walkthrough

This configuration file will collect the real-time log from the tail command from location /var/system.log to the destination location in HDFS.

# Define a source of Flume on my agent and use the memory-channel channel to call command of Linux tail the log file of Linux system

myagent.sources.tail-source.type = exec

        myagent.sources.tail-source.command = tail -F /var/log/system.log

       myagent.sources.tail-source.channels = memory-channel

# Define a sink of Flume that outputs to the logger from source input stream data

       myagent.sinks.log-sink.channel = memory-channel

       myagent.sinks.log-sink.type = logger

# Define a sink of Flume that outputs to HDFS location with data stream file type.

       myagent.sinks.hdfs-sink.channel = memory-channel

       myagent.sinks.hdfs-sink.type = hdfs

       myagent.hdfs_w1.hdfs.writeFormat = Text

       myagent.sinks.hdfs-sink.hdfs.path = hdfs:///mydata/destinationLog

       myagent.sinks.hdfs-sink.hdfs.fileType = DataStream

# Set the channel, source and sink component for this agent config

       myagent.channels = memory-channel

       myagent.sources = tail-source

       myagent.sinks = log-sink hdfs-sink

Run this command to start the agent:

       flume-ng agent -f /mylocalconfig.conf -n myagent

Verify the result

We will do some operations from our Linux system like creating files, removing files, etc.

vi a

rm a

After this operation, the sys log from Linux will update and the tail –f command will ingest that changes to our HDFS location as we configured above. We can check in the HDFS location to see the output

  hadoop fs –text /mydata/destinationLog /* | head –n 10

It will show the data change from the log file from Linux local in our HDFS file.

The agenda of big data analytics services providers was to make you understand Flume and its use for streaming log files to HDFS. For queries, kindly contact experts.

Hope that this blog can help you guys understand the steps to config the Flume to ingest the data from other systems to our HDFS for big data applications.

Soft Tech Solutions

How To Streaming Log File To HDFS Using Flume In Big Data Application

Introduction Flume to ingest the data to HDFS

Initial steps

Code walkthrough

Ethan Millar

0 comments:

Post a Comment

How To Streaming Log File To HDFS Using Flume In Big Data Application

Introduction Flume to ingest the data to HDFS

Initial steps

Code walkthrough

Ethan Millar

RELATED POSTS

0 comments:

Post a Comment