2012年7月11日 星期三

Bulkload data from Hadoop dfs to HBase

To bulkload data from Hadoop dfs to Hbase, the following steps requires to be ready before the importation.

1. Create table in HBase
     create 't1', 'c1'

2. Make sample directory in dfs
     hadoop dfs -mkdir sampledir

3. Put sample data into hadoop dfs
     hadoop dfs -put input.data sampledir

4. Pull the data from hadoop dfs to hbase
    ./hadoop jar /opt/hbase/hbase-0.90.3.jar importtsv -Dimporttsv.columns=HBASE_ROW_KEY,c1 t1 sampledir

If the following error encounters:
 Error: java.lang.ClassNotFoundExceptionorg.apache.zookeeper.KeeperException


copy the jar file into /hbase/lib
cp zookeeper-3.3.3.jar /hadoop/lib

2012年7月10日 星期二

Hbase (Version 0.90.3) Installation

     The installation of Zookeeper + Hadoop + Hbase is quite complicated. Though the menu and online resource has explained some methods to fully install the package. But sometimes tragedy happens. Here is my solution to install Hbase 0.90.3 against Hadoop 0.20.3


1. Make sure version of Hbase and Hadoop matches,
     If the version you use in Hadoop is 0.20.3, do not try to install Hbase with version differs alog
     Error like:
     FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled exception. Starting shutdown.
     java.io.IOException: Call to node1:9000 failed on local exception: java.io.EOFException
     happens


2. untar the file hbase-0.90.3.tar.gz with tar -zxvf hbase-0.90.3.tar.gz


3. Place the directory under /opt/hbase


4. Configure regionservers, hbase-site.xml, hbase-env.sh


   Regionservers: place the hostname of master and slaves in hadoop here
   hbase-site.xml: (IF .META. is CORRUPTED, be sure to replace rootdir from original path, says hdfs://node1:9000/hbase to hdfs://node1:9000/hbase1)





  hbase.rootdir
  hdfs://node1:9000/hbase
  The directory shared by region servers.


  hbase.cluster.distributed
  true


  hbase.zookeeper.quorum
  node1



  hbase-env.sh:
 export HBASE_OPTS="-ea -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode"
 export HBASE_REGIONSERVERS=${HBASE_HOME}/conf/regionservers
 export HBASE_LOG_DIR=${HBASE_HOME}/logs
 export HBASE_PID_DIR=/var/hadoop/hbase-pids
 export HBASE_MANAGES_ZK=false

5. Replace the jar file (!IMPORTANT)
     rm /opt/hbase/lib/hadoop-core-0.20-append-r1056497.jar
     cp /opt/hadoop-0.20.2/hadoop-0.20.0-core.jar /opt/hbase/lib/

6. sh start-hbase.sh

7. Using jps to check existing process

      10700 TaskTracker-- from hadoop
      10494 SecondaryNameNode-- from hadoop
      2676 RunJar-- from hadoop
      10232 NameNode -- from hadoop
      10575 JobTracker-- from hadoop
      10361 DataNode -- from hadoop
      3541 HMaster --from hbase
      3686 HRegionServer -- from hbase
      31336 QuorumPeerMain -- from zookeepr

6. ./hbase shell (START HBASE ING~)

7. If you encounter error: java.io.IOException: HRegionInfo was null or empty in -ROOT-
your meta is corrupted, be sure to use step 4 in setting hbase-site.xml to resolve the issue.

GOOD LUCK

2012年4月17日 星期二

403 Forbidden Problem While Using JSON Post On dJango

     While using ajax post in django, it always pops up a js error of 403 forbidden error. After trying hard to figure the real reason, the following steps may be helpful to solve the problem.

1. Append line:  'django.middleware.csrf.CsrfResponseMiddleware' in MIDDLEWARE_CLASSES

2. Include following JS code after jquery

source: https://docs.djangoproject.com/en/dev/ref/contrib/csrf/


$(document).ajaxSend(function(event, xhr, settings) {
    function getCookie(name) {
        var cookieValue = null;
        if (document.cookie && document.cookie != '') {
            var cookies = document.cookie.split(';');
            for (var i = 0; i < cookies.length; i++) {
                var cookie = jQuery.trim(cookies[i]);
                // Does this cookie string begin with the name we want?
                if (cookie.substring(0, name.length + 1) == (name + '=')) {
                    cookieValue = decodeURIComponent(cookie.substring(name.length + 1));
                    break;
                }
            }
        }
        return cookieValue;
    }
    function sameOrigin(url) {
        // url could be relative or scheme relative or absolute
        var host = document.location.host; // host + port
        var protocol = document.location.protocol;
        var sr_origin = '//' + host;
        var origin = protocol + sr_origin;
        // Allow absolute or scheme relative URLs to same origin
        return (url == origin || url.slice(0, origin.length + 1) == origin + '/') ||
            (url == sr_origin || url.slice(0, sr_origin.length + 1) == sr_origin + '/') ||
            // or any other URL that isn't scheme relative or absolute i.e relative.
            !(/^(\/\/|http:|https:).*/.test(url));
    }
    function safeMethod(method) {
        return (/^(GET|HEAD|OPTIONS|TRACE)$/.test(method));
    }

    if (!safeMethod(settings.type) && sameOrigin(settings.url)) {
        xhr.setRequestHeader("X-CSRFToken", getCookie('csrftoken'));
    }
});

2012年2月7日 星期二

How to Tune Database and SQL

    As a three year experienced database engineer, tuning database and SQL almost becomes my daily work. Here are some tips for database and SQL tuning.

For SQL Tuning:

1. Using Group by rather than DISTINCT:
     As DISTINCT order beforehand, then, organize the data into an ordered set, the sorting procedure may takes more time consumption than group by.

2. Using EXLAIN to view SQL Execution Status:
     Aware of whether all indexes are properly used during join procedure, if not, you may not require that index. You may remove unused index to increase insertion performance. In where clause, do not use stored procedure or string concat on equation comparison, which will corrupt the indexes usage.

3. How many indexes you should use on one table?
     Well, it depends, an index may increase query performance while querying and joining tables.But index may be space consumption (Double the size used for each column indexed), and delay insertion speed.

4. Using result set insertion rather than prepared statement, and using prepared statement instead of executing query:
     That is, SQL syntax parsing and interpreting is the bottleneck while executing SQL statement. Avoiding SQL syntax parsing may well increase the execution performance.

5.  Understanding the differences within scan method, join method and join order:
     Scan method has: Sequential Scan, Index Scan, bitmap Scan

  •      Sequential Scan: Best for scanning small size table.
  •      Index Scan: Using B-tree index to scan big size table.
  •      Bitmap Scan: Best for combined index

     Join Method has: Nested loop, Hash Join, Merge Join

  •      Nested Loop: Best for two tables have related indexes. Using two for-loops to join tables. Best for small table joining. 
  •      Hash Loop: Building hash table (Initial Setup) is time consumption, but once the hash is built, the joining will be much faster. Hash are built in memory, so be sure the memory can contains the amount of hashs
  •      Merge Join: Ideal for large table, need to sort data before joining. 

     Join order: Aware whether the most effective index is used.

 

2012年1月24日 星期二

Migrating Web Service to JAVA Applet (2)

     Before migrating whole web service to applet, the hardest part is how to change the mindset from developing web page to JAVA applet.

     Take Web page designing as an example:
In django, the directory looks like following structure


     Basically, every html page is deployed under templates, and view and controlling program is constructed under the project_app you've created. urls.py stands in between to communicates what should a views returns on the web page. Via the usage of page extension, you may extends the function module by extending base.html.




However, on java applet, the concept of laying a page is not that straightforward. Every component is deployed on a drawing canvas. Every components should be deployed in a container. 
1. So first of all, A frame (JFrame) should be deployed. 
2. Within the frame, a panel (ContentContainer)  is deployed to contain different components
3. For the usage of menu list, ContentPanel should be deployed on the upper of the frame.
4. Under the ContentPanel, thinking of switching panel while clicking different menu items, a CardLayout is deployed to manage different panel.
5. For every menu items clicked, a panel within the cardlayout is switched accordingly.

2012年1月22日 星期日

Migrating Web Service to JAVA Applet (1)

     As for the requirement to implement a data search engine for my customers. I first implement a prototype tool by using python + django. 

     Thanks to the online resource and strong community of python and django. The development is blazing fast. It took me no longer than three days to complete the task.

    However, as the installation of all components on others' computer is kind of complicated as for I mixed some of the elements from Java. Therefore, the only way to deliver the product is has it installed on VM. Though it sounds like a stupid idea (this whole idea is not initiated by me, but as an employee, what can I decide). Thus, I am now trying to migrate whole thing from python + django to Java Applet.

     Since it has been quite for a long time for me not writing java applet. I did do some research on this topic.

Here comes the implementation recipe: 

Target: A JDBC interface to search sensitive data within target DB.

Regimentation Notice:
     1. A menu to access different page.
     2. An asynchronous access method.
     3. Multiple DB access interface
     4. Backend repository
     5. Statistical report

In phase 1, the implementation of menu structure is as described as follows:

1. Create A JFrame container
     a. Create JMenuBar for menulist
        i. Create JMenu under JMenubar 

2. Create JPanel
     a. Create JScrollPane under panel
        i. Create JEditorPane under JScrollPane

To enable the clicking of menu will trigger different page displayed withinin JMenu, a mouse clicked event is added for the purpose.