2012年7月11日 星期三

Bulkload data from Hadoop dfs to HBase

To bulkload data from Hadoop dfs to Hbase, the following steps requires to be ready before the importation.

1. Create table in HBase
     create 't1', 'c1'

2. Make sample directory in dfs
     hadoop dfs -mkdir sampledir

3. Put sample data into hadoop dfs
     hadoop dfs -put input.data sampledir

4. Pull the data from hadoop dfs to hbase
    ./hadoop jar /opt/hbase/hbase-0.90.3.jar importtsv -Dimporttsv.columns=HBASE_ROW_KEY,c1 t1 sampledir

If the following error encounters:
 Error: java.lang.ClassNotFoundExceptionorg.apache.zookeeper.KeeperException


copy the jar file into /hbase/lib
cp zookeeper-3.3.3.jar /hadoop/lib

2012年7月10日 星期二

Hbase (Version 0.90.3) Installation

     The installation of Zookeeper + Hadoop + Hbase is quite complicated. Though the menu and online resource has explained some methods to fully install the package. But sometimes tragedy happens. Here is my solution to install Hbase 0.90.3 against Hadoop 0.20.3


1. Make sure version of Hbase and Hadoop matches,
     If the version you use in Hadoop is 0.20.3, do not try to install Hbase with version differs alog
     Error like:
     FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled exception. Starting shutdown.
     java.io.IOException: Call to node1:9000 failed on local exception: java.io.EOFException
     happens


2. untar the file hbase-0.90.3.tar.gz with tar -zxvf hbase-0.90.3.tar.gz


3. Place the directory under /opt/hbase


4. Configure regionservers, hbase-site.xml, hbase-env.sh


   Regionservers: place the hostname of master and slaves in hadoop here
   hbase-site.xml: (IF .META. is CORRUPTED, be sure to replace rootdir from original path, says hdfs://node1:9000/hbase to hdfs://node1:9000/hbase1)





  hbase.rootdir
  hdfs://node1:9000/hbase
  The directory shared by region servers.


  hbase.cluster.distributed
  true


  hbase.zookeeper.quorum
  node1



  hbase-env.sh:
 export HBASE_OPTS="-ea -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode"
 export HBASE_REGIONSERVERS=${HBASE_HOME}/conf/regionservers
 export HBASE_LOG_DIR=${HBASE_HOME}/logs
 export HBASE_PID_DIR=/var/hadoop/hbase-pids
 export HBASE_MANAGES_ZK=false

5. Replace the jar file (!IMPORTANT)
     rm /opt/hbase/lib/hadoop-core-0.20-append-r1056497.jar
     cp /opt/hadoop-0.20.2/hadoop-0.20.0-core.jar /opt/hbase/lib/

6. sh start-hbase.sh

7. Using jps to check existing process

      10700 TaskTracker-- from hadoop
      10494 SecondaryNameNode-- from hadoop
      2676 RunJar-- from hadoop
      10232 NameNode -- from hadoop
      10575 JobTracker-- from hadoop
      10361 DataNode -- from hadoop
      3541 HMaster --from hbase
      3686 HRegionServer -- from hbase
      31336 QuorumPeerMain -- from zookeepr

6. ./hbase shell (START HBASE ING~)

7. If you encounter error: java.io.IOException: HRegionInfo was null or empty in -ROOT-
your meta is corrupted, be sure to use step 4 in setting hbase-site.xml to resolve the issue.

GOOD LUCK