HBase
Encyclopedia
HBase is an open source
Open source
The term open source describes practices in production and development that promote access to the end product's source materials. Some consider open source a philosophy, others consider it a pragmatic methodology...

, non-relational
Relational model
The relational model for database management is a database model based on first-order predicate logic, first formulated and proposed in 1969 by Edgar F...

, distributed database modeled after Google's
Google
Google Inc. is an American multinational public corporation invested in Internet search, cloud computing, and advertising technologies. Google hosts and develops a number of Internet-based services and products, and generates profit primarily from advertising through its AdWords program...

 BigTable
BigTable
BigTable is a compressed, high performance, and proprietary database system built on Google File System , Chubby Lock Service, SSTable and a few other Google technologies; it is currently not distributed nor is it used outside of Google, although Google offers access to it as part of their Google...

 and is written in Java
Java (programming language)
Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...

. It is developed as part of Apache Software Foundation
Apache Software Foundation
The Apache Software Foundation is a non-profit corporation to support Apache software projects, including the Apache HTTP Server. The ASF was formed from the Apache Group and incorporated in Delaware, U.S., in June 1999.The Apache Software Foundation is a decentralized community of developers...

's Apache Hadoop
Hadoop
Apache Hadoop is a software framework that supports data-intensive distributed applications under a free license. It enables applications to work with thousands of nodes and petabytes of data...

 project and runs on top of HDFS (Hadoop Distributed Filesystem), providing BigTable-like capabilities for Hadoop. That is, it provides a fault-tolerant way of storing large quantities of sparse data.

HBase features compression, in-memory operation, and Bloom filter
Bloom filter
A Bloom filter, conceived by Burton Howard Bloom in 1970, is a space-efficient probabilistic data structure that is used to test whether an element is a member of a set. False positives are possible, but false negatives are not; i.e. a query returns either "inside set " or "definitely not in set"...

s on a per-column basis as outlined in the original BigTable paper. Tables in HBase can serve as the input and output for MapReduce
MapReduce
MapReduce is a software framework introduced by Google in 2004 to support distributed computing on large data sets on clusters of computers. Parts of the framework are patented in some countries....

 jobs run in Hadoop, and may be accessed through the Java API but also through REST
Rest
Rest may refer to:* Leisure* Human relaxation* SleepRest may also refer to:* Rest , a pause in a piece of music* Rest , the relation between two observers* Rest , a 2008 album by Gregor Samsa...

, Avro
Avro (serialization system)
Avro is a remote procedure call and serialization framework developed within Apache's Hadoop project. It uses JSON for defining data types and protocols, and serializes data in a compact binary format...

 or Thrift
Thrift (protocol)
Thrift is an interface definition language that is used to define and create services for numerous languages. It is used as a remote procedure call framework and was developed at Facebook for "scalable cross-language services development"...

 gateway APIs.

HBase is not a direct replacement for a classic SQL
SQL
SQL is a programming language designed for managing data in relational database management systems ....

 Database
Database
A database is an organized collection of data for one or more purposes, usually in digital form. The data are typically organized to model relevant aspects of reality , in a way that supports processes requiring this information...

, although recently its performance has improved, and it is now serving several data-driven websites, including Facebook's Messaging Platform.

History

Apache HBase began as a project by the company Powerset
Powerset (company)
Powerset is a Microsoft owned company based in San Francisco, California that, in 2006, was developing a natural language search engine for the Internet....

 out of a need to process massive amounts of data for the purposes of natural language search. It is now a top-level Apache project and has generated considerable interest.

Facebook elected to implement its new messaging platform using HBase in November 2010.

See also

  • Apache Cassandra
    Cassandra (database)
    Apache Cassandra is an open source distributed database management system. It is an Apache Software Foundation top-level project designed to handle very large amounts of data spread out across many commodity servers while providing a highly available service with no single point of failure...

  • Hypertable
    Hypertable
    Hypertable is an open source database inspired by publications on the design of Google's BigTable. The project is based on experience of engineers who were solving large-scale data-intensive tasks for many years....

  • MongoDB
    MongoDB
    MongoDB is an open source, high-performance, schema-free, document-oriented database written in the C++ programming language...

  • Project Voldemort
  • Riak
    Riak
    Riak is a NoSQL database implementing the principles from Amazon's Dynamo paper.Riak has a pluggable backend for its core shard-partitioned storage, with the default storage backend being Bitcask as of the 0.12 release...


External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK