Saturday, 5 January 2013

Installation Of HBase In Ubuntu


Use Apache HBase when you need random, realtime read/write access to your Big Data. This project's goal is the hosting of very large tables -- billions of rows X millions of columns -- atop clusters of commodity hardware. Apache HBase is an open-source, distributed, versioned, column-oriented store modeled after Google's Bigtable: A Distributed Storage System for Structured Data by Chang et al. Just as Bigtable leverages the distributed data storage provided by the Google File System, Apache HBase provides Bigtable-like capabilities on top of Hadoop and HDFS.

The HBASE-0.94.2 installation is done in below versions of Linux, Java and Hadoop respectively.

UBUNTU 12.04 LTS
JAVA 1.7.0_09
HADOOP 1.1.0

I have hduser as a dedicated hadoop system user. I had installed my Hadoop in /home/hduser/hadoop folder. Now I am going to install hbase  in /home/hduser folder. Change the directory to the hduser and execute below commands.

Download the hbase from below URL using wget.

Unzip the tar file.
sudo tar xzf  hbase-0.94.2.tar.gz

Change the name to hbase.
sudo mv hbase-0.94.2 hbase

Set the JAVA_HOME and HBASE_CLASSPATH in hbase-env.sh.
hbase-env.sh file exist in the conf folder of hbase.[ /home/hduser/hbase/conf/hbase-env.sh]

Change
 # The java implementation to use.  Required.
 # export JAVA_HOME=/usr/lib/j2sdk1.5-sun
to
 # The java implementation to use.  Required.
 export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-amd64
Add the hbase  conf directory to the HBASE_CLASSPATH
export HBASE_CLASSPATH=/home/hduser/hbase/conf/

Set the  HBASE_HOME path.
export HBASE_HOME=/home/hduser/hbase
export PATH=${PATH}:${HBASE_HOME}/bin

hbase-site.xml.
hbase-site.xml file exist in the conf folder of hbase.[ /home/hduser/hbase/conf/hbase-site.xml]

I had 3 node hadoop cluster one as master and two as slaves. In master node I had namenode,secondarynamenode and jobtracker. In slave nodes I had datanode and tasktracker.

<configuration>
   <property>
    <name>hbase.rootdir</name>
    <value>hdfs://master:54310/hbase </value>
    <description>
       The directory shared by region servers. Should be fully-qualified to include the filesystem to use.
       E.g: hdfs://NAMENODE_SERVER:PORT/HBASE_ROOTDIR
    </description>   
   </property>
   <property>
    <name>hbase.cluster.distributed</name>
    <value>true</value>
    <description>The mode the cluster will be in. Possible values are
      false: standalone and pseudo-distributed setups with managed Zookeeper
      true: fully-distributed with unmanaged Zookeeper Quorum (see hbase-env.sh)
    </description>
   </property>
   <property>
    <name>hbase.zookeeper.quorum</name>
    <value>10.146.244.133</value>
    <description>Comma separated list of servers in the ZooKeeper Quorum.
      For example, "host1.mydomain.com,host2.mydomain.com,host3.mydomain.com".
      By default this is set to localhost for local and pseudo-distributed modes
      of operation. For a fully-distributed setup, this should be set to a full
      list of ZooKeeper quorum servers. If HBASE_MANAGES_ZK is set in hbase-env.sh
      this is the list of servers which we will start/stop ZooKeeper on.
      </description>
   </property>
   <property>
    <name>hbase.zookeeper.dns.nameserver</name>
    <value>10.146.244.133</value>
    <description> The host name or IP address of the name server (DNS) which a ZooKeeper server should use to determine the host name used by the master for communication and display purposes.
    </description>
  </property>

 <property>
    <name>hbase.regionserver.dns.nameserver</name>
    <value>10.146.244.133</value>
    <description> The host name or IP address of the name server (DNS) which a region server should use to determine the host name used by the master for communication and display purposes.
    </description>
  </property>

 <property>
    <name>hbase.master.dns.nameserver</name>
    <value>10.146.244.133</value>
    <description> The host name or IP address of the name server (DNS) which a master should use to determine the host name used for communication and display purposes.    </description>
  </property>
</configuration>

In the below properties I had used the masters ip address.
hbase.zookeeper.quorum
hbase.zookeeper.dns.nameserver
hbase.master.dns.nameserver
hbase.regionserver.dns.nameserver

Specify RegionServers.
regionservers file exist in the conf folder of hbase.
[/home/hduser/hbase/conf/regionservers]

master
slave1
slave2

Here I have specified the master also as a regionserver.

Remote copy hbase folder from master node to slave nodes.
scp -r /home/hduser/hbase 10.146.244.62:/home/hduser/hbase      [Slave1]
scp -r /home/hduser/hbase 10.146.242.32:/home/hduser/hbase      [Slave2]

Run Hbase.
Now run the hbase shell command.

NOTE:- If you get error like DNS name not found .You have to create  the forward and reverse lookup zones of ips. Use the bind9 to create lookups.

REST API.
To start hbase rest server use the below command.
hbase rest start [The REST server start listening at 8080].
We can define our port using the below command.
hbase rest start -p 9090

Hbase MapReduce
To run mapreduce jobs in hadoop which uses input as hbase and output as hbase you have to add the hbase jar files in hadoop class path otherwise you get the NoClassDefFoundError errors.

I had added the below jars to the HADOOP_CLASSPATH to get it work.

export HADOOP_CLASSPATH=$HBASE_HOME/hbase-0.94.2.jar:$HBASE_HOME/hbase-0.94.2-tests.jar:$HBASE_HOME/lib/zookeeper-3.4.3.jar:$HBASE_HOME/lib/avro-1.5.3.jar:$HBASE_HOME/lib/avro-ipc-1.5.3.jar:$HBASE_HOME/lib/commons-cli-1.2.jar:$HBASE_HOME/lib/jackson-core-asl-1.8.8.jar:$HBASE_HOME/lib/jackson-mapper-asl-1.8.8.jar:$HBASE_HOME/lib/commons-httpclient-3.1.jar:$HBASE_HOME/lib/jetty-6.1.26.jar:$HBASE_HOME/lib/hadoop-core-1.0.3.jar:$HBASE_HOME/lib/com.google.protobuf_2.3.0.jar

All the above jars comes with hbase, but com.google.protobuf_2.3.0.jar will not come with hbase. You have to explicitly download it from internet and add it to the Hadoop class path.

No comments:

Post a Comment