sekhartechblog: Installation Of HBase In Ubuntu

Use Apache HBase when you need random, realtime read/write access to your Big Data. This project's goal is the hosting of very large tables -- billions of rows X millions of columns -- atop clusters of commodity hardware. Apache HBase is an open-source, distributed, versioned, column-oriented store modeled after Google's Bigtable: A Distributed Storage System for Structured Data by Chang et al. Just as Bigtable leverages the distributed data storage provided by the Google File System, Apache HBase provides Bigtable-like capabilities on top of Hadoop and HDFS.

The HBASE-0.94.2 installation is done in below versions of Linux, Java and Hadoop respectively.

UBUNTU 12.04 LTS

JAVA 1.7.0_09

HADOOP 1.1.0

I have hduser as a dedicated hadoop system user. I had installed my Hadoop in /home/hduser/hadoop folder. Now I am going to install hbase in /home/hduser folder. Change the directory to the hduser and execute below commands.

Download the hbase from below URL using wget.

wget http://apache.techartifact.com/mirror/hbase/stable/hbase-0.94.2.tar.gz

Unzip the tar file.

sudo tar xzf hbase-0.94.2.tar.gz

Change the name to hbase.

sudo mv hbase-0.94.2 hbase

Set the JAVA_HOME and HBASE_CLASSPATH in hbase-env.sh.

hbase-env.sh file exist in the conf folder of hbase.[ /home/hduser/hbase/conf/hbase-env.sh]

Change

# The java implementation to use. Required.

# export JAVA_HOME=/usr/lib/j2sdk1.5-sun

# The java implementation to use. Required.

export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-amd64

Add the hbase conf directory to the HBASE_CLASSPATH

export HBASE_CLASSPATH=/home/hduser/hbase/conf/

Set the HBASE_HOME path.

export HBASE_HOME=/home/hduser/hbase

export PATH=${PATH}:${HBASE_HOME}/bin

hbase-site.xml.

hbase-site.xml file exist in the conf folder of hbase.[ /home/hduser/hbase/conf/hbase-site.xml]

I had 3 node hadoop cluster one as master and two as slaves. In master node I had namenode,secondarynamenode and jobtracker. In slave nodes I had datanode and tasktracker.

<name>hbase.rootdir</name>

<value>hdfs://master:54310/hbase </value>

The directory shared by region servers. Should be fully-qualified to include the filesystem to use.

E.g: hdfs://NAMENODE_SERVER:PORT/HBASE_ROOTDIR

</description>

</property>

<name>hbase.cluster.distributed</name>

<description>The mode the cluster will be in. Possible values are

false: standalone and pseudo-distributed setups with managed Zookeeper

true: fully-distributed with unmanaged Zookeeper Quorum (see hbase-env.sh)

</description>

</property>

<name>hbase.zookeeper.quorum</name>

<description>Comma separated list of servers in the ZooKeeper Quorum.

For example, "host1.mydomain.com,host2.mydomain.com,host3.mydomain.com".

By default this is set to localhost for local and pseudo-distributed modes

of operation. For a fully-distributed setup, this should be set to a full

list of ZooKeeper quorum servers. If HBASE_MANAGES_ZK is set in hbase-env.sh

this is the list of servers which we will start/stop ZooKeeper on.

</description>

</property>

<name>hbase.zookeeper.dns.nameserver</name>

<description> The host name or IP address of the name server (DNS) which a ZooKeeper server should use to determine the host name used by the master for communication and display purposes.

</description>

</property>

<name>hbase.regionserver.dns.nameserver</name>

<description> The host name or IP address of the name server (DNS) which a region server should use to determine the host name used by the master for communication and display purposes.

</description>

</property>

<name>hbase.master.dns.nameserver</name>

<description> The host name or IP address of the name server (DNS) which a master should use to determine the host name used for communication and display purposes. </description>

</property>

</configuration>

In the below properties I had used the masters ip address.

hbase.zookeeper.quorum

hbase.zookeeper.dns.nameserver

hbase.master.dns.nameserver

hbase.regionserver.dns.nameserver

Specify RegionServers.

regionservers file exist in the conf folder of hbase.

[/home/hduser/hbase/conf/regionservers]

master

slave1

slave2

Here I have specified the master also as a regionserver.

Remote copy hbase folder from master node to slave nodes.

scp -r /home/hduser/hbase 10.146.244.62:/home/hduser/hbase [Slave1]

scp -r /home/hduser/hbase 10.146.242.32:/home/hduser/hbase [Slave2]

Run Hbase.

Now run the hbase shell command.

NOTE:- If you get error like DNS name not found .You have to create the forward and reverse lookup zones of ips. Use the bind9 to create lookups.

REST API.

To start hbase rest server use the below command.

hbase rest start [The REST server start listening at 8080].

We can define our port using the below command.

hbase rest start -p 9090

Hbase MapReduce

To run mapreduce jobs in hadoop which uses input as hbase and output as hbase you have to add the hbase jar files in hadoop class path otherwise you get the NoClassDefFoundError errors.

I had added the below jars to the HADOOP_CLASSPATH to get it work.

export HADOOP_CLASSPATH=$HBASE_HOME/hbase-0.94.2.jar:$HBASE_HOME/hbase-0.94.2-tests.jar:$HBASE_HOME/lib/zookeeper-3.4.3.jar:$HBASE_HOME/lib/avro-1.5.3.jar:$HBASE_HOME/lib/avro-ipc-1.5.3.jar:$HBASE_HOME/lib/commons-cli-1.2.jar:$HBASE_HOME/lib/jackson-core-asl-1.8.8.jar:$HBASE_HOME/lib/jackson-mapper-asl-1.8.8.jar:$HBASE_HOME/lib/commons-httpclient-3.1.jar:$HBASE_HOME/lib/jetty-6.1.26.jar:$HBASE_HOME/lib/hadoop-core-1.0.3.jar:$HBASE_HOME/lib/com.google.protobuf_2.3.0.jar

All the above jars comes with hbase, but com.google.protobuf_2.3.0.jar will not come with hbase. You have to explicitly download it from internet and add it to the Hadoop class path.

sekhartechblog

Saturday, 5 January 2013

Installation Of HBase In Ubuntu

No comments:

Post a Comment

About Me