Use Apache HBase when you need random, realtime read/write
access to your Big Data. This project's goal is the hosting of very large
tables -- billions of rows X millions of columns -- atop clusters of commodity
hardware. Apache HBase is an open-source, distributed, versioned,
column-oriented store modeled after Google's Bigtable: A Distributed Storage
System for Structured Data by Chang
et al. Just as Bigtable leverages the distributed data storage provided by the
Google File System, Apache HBase provides Bigtable-like capabilities on top of
Hadoop and HDFS.
The HBASE-0.94.2 installation is done in below versions of Linux,
Java and Hadoop respectively.
UBUNTU 12.04 LTS
JAVA 1.7.0_09
HADOOP
1.1.0
I have hduser as a dedicated hadoop system user. I had
installed my Hadoop in /home/hduser/hadoop folder. Now I am going to install hbase
in /home/hduser folder. Change the directory to the hduser and execute
below commands.
Download the hbase from below URL using wget.
Unzip the tar file.
sudo
tar xzf hbase-0.94.2.tar.gz
Change the name to hbase.
sudo
mv hbase-0.94.2
hbase
Set the
JAVA_HOME and HBASE_CLASSPATH in hbase-env.sh.
hbase-env.sh
file exist in the conf folder of hbase.[ /home/hduser/hbase/conf/hbase-env.sh]
Change
# The java implementation to use. Required.
# export JAVA_HOME=/usr/lib/j2sdk1.5-sun
to
# The java implementation to use. Required.
export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-amd64
Add
the hbase conf directory to the
HBASE_CLASSPATH
export
HBASE_CLASSPATH=/home/hduser/hbase/conf/
Set the HBASE_HOME
path.
export
HBASE_HOME=/home/hduser/hbase
export
PATH=${PATH}:${HBASE_HOME}/bin
hbase-site.xml.
hbase-site.xml
file exist in the conf folder of hbase.[ /home/hduser/hbase/conf/hbase-site.xml]
I
had 3 node hadoop cluster one as master and two as slaves. In master node I had
namenode,secondarynamenode and jobtracker. In slave nodes I had datanode and
tasktracker.
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://master:54310/hbase
</value>
<description>
The directory shared by region servers.
Should be fully-qualified to include the filesystem to use.
E.g:
hdfs://NAMENODE_SERVER:PORT/HBASE_ROOTDIR
</description>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
<description>The mode the cluster
will be in. Possible values are
false: standalone and pseudo-distributed
setups with managed Zookeeper
true: fully-distributed with unmanaged
Zookeeper Quorum (see hbase-env.sh)
</description>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>10.146.244.133</value>
<description>Comma separated list of
servers in the ZooKeeper Quorum.
For example,
"host1.mydomain.com,host2.mydomain.com,host3.mydomain.com".
By default this is set to localhost for
local and pseudo-distributed modes
of operation. For a fully-distributed
setup, this should be set to a full
list of ZooKeeper quorum servers. If
HBASE_MANAGES_ZK is set in hbase-env.sh
this is the list of servers which we will
start/stop ZooKeeper on.
</description>
</property>
<property>
<name>hbase.zookeeper.dns.nameserver</name>
<value>10.146.244.133</value>
<description> The host
name or IP address of the name server (DNS) which a ZooKeeper server should use
to determine the host name used by the master for communication and display
purposes.
</description>
</property>
<property>
<name>hbase.regionserver.dns.nameserver</name>
<value>10.146.244.133</value>
<description> The host
name or IP address of the name server (DNS) which a region server should use to
determine the host name used by the master for communication and display
purposes.
</description>
</property>
<property>
<name>hbase.master.dns.nameserver</name>
<value>10.146.244.133</value>
<description> The host
name or IP address of the name server (DNS) which a master should use to
determine the host name used for communication and display purposes. </description>
</property>
</configuration>
In
the below properties I had used the masters ip address.
hbase.zookeeper.quorum
hbase.zookeeper.dns.nameserver
hbase.master.dns.nameserver
hbase.regionserver.dns.nameserver
Specify RegionServers.
regionservers
file exist in the conf folder of hbase.
[/home/hduser/hbase/conf/regionservers]
master
slave1
slave2
Here
I have specified the master also as a regionserver.
Remote copy
hbase folder from master node to slave nodes.
scp
-r /home/hduser/hbase 10.146.244.62:/home/hduser/hbase [Slave1]
scp
-r /home/hduser/hbase 10.146.242.32:/home/hduser/hbase [Slave2]
Run Hbase.
Now
run the hbase shell command.
NOTE:- If you
get error like DNS name not found .You have to
create the forward and reverse lookup
zones of ips. Use the bind9 to create lookups.
REST
API.
To start hbase rest server use the below command.
hbase rest start [The REST server start listening at 8080].
We can define our port using the below command.
hbase rest start -p 9090
Hbase
MapReduce
To run mapreduce jobs in hadoop which uses input as hbase and
output as hbase you have to add the hbase jar files in hadoop class path otherwise
you get the NoClassDefFoundError
errors.
I had added the below jars to the HADOOP_CLASSPATH to get it work.
export
HADOOP_CLASSPATH=$HBASE_HOME/hbase-0.94.2.jar:$HBASE_HOME/hbase-0.94.2-tests.jar:$HBASE_HOME/lib/zookeeper-3.4.3.jar:$HBASE_HOME/lib/avro-1.5.3.jar:$HBASE_HOME/lib/avro-ipc-1.5.3.jar:$HBASE_HOME/lib/commons-cli-1.2.jar:$HBASE_HOME/lib/jackson-core-asl-1.8.8.jar:$HBASE_HOME/lib/jackson-mapper-asl-1.8.8.jar:$HBASE_HOME/lib/commons-httpclient-3.1.jar:$HBASE_HOME/lib/jetty-6.1.26.jar:$HBASE_HOME/lib/hadoop-core-1.0.3.jar:$HBASE_HOME/lib/com.google.protobuf_2.3.0.jar
All the above jars comes with hbase, but
com.google.protobuf_2.3.0.jar will not come with hbase. You have to explicitly
download it from internet and add it to the Hadoop class path.