Soft Tech Solutions

How To Configure Hadoop Cluster For Successful Hadoop Deployments?

In this blog post, we will learn how to configure Hadoop cluster for maximizing production deployments and minimizing long term adjustments.

How To Configure Hadoop Cluster For Successful Hadoop Deployments?

Before you start working on Hadoop, it is necessary to decide on hardware that can help you most for successful Hadoop architect implementation. You should also be sure on cluster planning, CPU usage and memory allocation. Here, we will discuss on how clusters can be configured for Hadoop architect for better business solutions after reading this blog, you will have a strong idea on successful product deployment and cluster configuration.

It is necessary to discuss on some important attributes like network, hosts or disks and necessary to check either they are configured correctly or not. Also it is necessary to check how services and disks are laid down to utilize them in best possible way and minimizing problems related to new data sets.

Network for Hadoop architect

A Hadoop process finds out the hostname and server on which it is running and correct IP address too. It can be done through DNS server and can be configured properly through look up method. If a node is working correctly then it will work in dual mode – look up method and reverse look up method. All cluster hosts should be able to communicate in best way with each other. If you are using Linux operating system then it is easy to check network configuration details with host command.

You must be thinking why to use DNS every time. DNS is easy to implement and less prone to errors. Domain name should be fully qualified due to security reasons. You can verify domain name with fqdn command.

Cloudera Manager for cluster management

If you find cluster management tough then you are strongly recommend using Cloudera manager for refined result. It works as a pioneer for managing complex data nodes. In case, you are familiar with usage of Cloudera Manager then study its documentation that makes its benefits pretty clear. Cloudera manager is available in different version according to your business needs and requirements. It can be used as a wizard on your finger tips and installation is also easy and quick.

If any cluster having 50 + nodes, it can be handled pretty well with Cloudera Manager. Integration of any external database like Hbase and Hive is very common along with Cloudera manager. Additional services for trail blazing data management can be deployed on demand. If you move to little deeper into concept then various services components are mapped together internally then you don’t know about.

Conclusion
The above discussion concludes that cluster management is easy when done with proper tools and sufficient knowledge. As a developer you should spend extra time in understanding Hadoop architect and configuring clusters. By following proper guidelines and instructions, cluster management can be made in your favor that results into successful deployments by Hadoop architect & developers. Now don’t get fuss so much into cluster configuration and spend more time on other business activities like a boss.

Get engaged with us with our future posts to know more on clusters and Hadoop architect.

Read More Related This :

Learn Apache Hive installation on Ubuntu Linux (14.04 LTS)

Apache Hive is data warehouse software designed on Hadoop. The software facilitates many functions like data analysis, large database management, and data summarization. You must install and take a Hive tour on your Ubuntu Linux.

Installing HBase on Ubuntu Linux

Aegis big data hadoop developer are sharing this tutorial with worldwide hadoop development community to make them learn installation of HBase on Ubuntu Linux in standalone mode. We will discuss every basic steps and prerequisites required for installing HBase to make you understand better.

=> We are introducing how to install HBase on Ubuntu Linux in standalone mode.

HBase Standalone Mode:

=> By default HBase runs in standalone mode. In standalone mode HBase uses local file system rather than HDFS.

=> HBase runs all daemons and zookeeper in same JVM in standalone mode.

Conventions that I followed in this artical:

commands
configuration file names
Lines to be inserted in files.

Basic Prerequisites

=> Hadoop
=> Java
=> Loopback IP - Hbase aspects loopback IP to be 127.0.0.1

This tutorial has been tested using following environment.

OS : Ubuntu Linux(14.04 LTS) – 64bit Hadoop
Hadoop - 1.0.4 HBase
HBase - 0.94.6 Java
Oracle Java 1.7.0_75

-> In order to install HBase in standalone mode please follow below steps,

-> I am assuming that Java and hadoop are already installed and running properly, so I am not explaining those two prerequests in installation steps.

Check your IP setup:

One of the weired problem occures is loopback IP problem. By default when you open hosts file in /etc/hosts, you see something like,

127.0.0.1 localhost
127.0.1.1 <server fqn> <server name, as in /etc/hostname>

now, change second line to,

127.0.0.1 localhost
127.0.0.1 (name of your computer)

my /etc/hosts file looks like,

2. Download and extract HBase

Download HBase from here
Enter into directory, where HBase is downloaded, By default it is downloaded into /Downloads

$ cd Downloads/

Extract tar file using following command

$ tar -xzvf hbase-0.94.6.tar.gz

Create directory using following command in /user/lib

$ sudo mkdir /usr/lib/hbase

Move extracted hbase-0.94.6 folder to newly created directory using following command

$ mv hbase-0.94.6 /usr/lib/hbase/hbase-0.94.6

3. Configure Java for HBase

Open file {HBASE_INSTALL_DIRECTORY}/conf/hbase-env.sh and set path to java installed in your system.

My hbase-env.sh file looks like,

Set the HBASE_HOME path in bashrc file

I used gksu gedit $HOME/.bashrc command to edit file and appended following lines.

         #HBASE_PATH
         export HBASE_HOME=/usr/lib/hbase/hbase-0.94.6
         export PATH=$PATH:$HBASE_HOME/bin

=> Now, we have to set the directory path in conf/hbase-site.xml and add following lines to set hbase root directory where HBase will store its data.

=> By default, hbase.rootdir is set to /tmp/hbase-${user.name} that means you will lose all your data whenever your server reboots.

=> So replace DIRECTORY in the hbase-site.xml with a path to a directory where you want HBase to store its data.

<?xml version="1.0"?>
<?xml-stylesheet href="configuration.xsl"?>
<configuration>
<property>
<name>hbase.rootdir</name>
<value>file:///home/{USER}/ {LOCATION_OF_DIRECTORY_WHERE_YOU_WANT_HBASE_STORES_DATA}</value>
</property>

<property>
<name>hbase.zookeeper.property.dataDir</name> <value>/home/{USER}/

{LOCATION_OF_DIRECTORY_WHERE_YOU_WANT_ZOOKEEPER_STORES_DATA} </value>
</property>
</configuration>

My hbase-site.xml looks like

Now you can start and stop hbase using following command

start-hbase.sh
hbase shell

To Stop HBase, Use following command

stop-hbase.sh

HBase also provides web interface

http://localhost:60010 master
http://localhost:60030 region server

Now, your HBase is running in standalone mode, you can run hbase shell commands.

Do try and let us know the feedback of this tutorial. To make queries related to hadoop development, you can mention your requirements below in the comments and our big data hadoop development programming will answer all your queries.

Get More Information About: LED TV repair bonding machine

COF bonding machine