NoSQL Apache Hbase Data Model Design

NoSQL Apache Hbase Data Model Design

Data Model In Hbase

In Hbase,data is stored as a table(have rows and columns) similar to RDBMS but this is not a helpful analogy. Instead, it can be helpful to think of an HBase table as a multi-dimensional map.

Hbase Data model Terminology

Table(Hbase table consists of rows)

row(Row in hbase which contains row key and one or more columns with value associated with them)

column(A column in HBase consists of a column family and a column qualifier, which are delimited by a : (colon) character)

column family(having set of columns and their values,the column families should be considered carefully during schema design)

column qualifier(A column qualifier is added to a column family to provide the index for a given piece of data)

cell(A cell is a combination of row, column family, and column qualifier, and contains a value and a timestamp, which represents the value’s version)

timestamp( represents the time on the RegionServer when the data was written, but you can specify a different timestamp value when you put data into the cell)

HBase’s API for data manipulation consists of three primary methods: Get, Put, and Scan. Gets and Puts are specific to particular rows and need the row key to be provided. Scans are done over a range of rows

NoSQL Apache Hbase Data Model Design

it’s easier to understand the data model as a multidimensional map.The first row from the table in Figure 1 has been represented as a multidimensional map in figure 2


To define the schema, several properties about HBase’s tables have to be taken into account.
1.Indexing is only done based on the Key.
2.Tables are stored sorted based on the row key. Each region in the table is responsible for a part of the row key space and is identified by the start and end row key.The region contains a sorted list of rows from the start key to the end key.
3.Everything in HBase tables is stored as a byte[ ]. There are no types.
4.Atomicity is guaranteed only at a row level. There is no atomicity guarantee across rows, which means that there are no multi-row transactions.
5.Column families have to be defined up front at table creation time.
6.Column qualifiers are dynamic and can be defined at write time. They are stored as byte[ ] so you can even put data in them.

Data Model Operations

1)Get(returns attributes for a specified row,Gets are executed via HTable.get)
2)put(Put either adds new rows to a table (if the key is new) or can update existing rows (if the key already exists). Puts are executed via HTable.put (writeBuffer) or HTable.batch (non-writeBuffer))
3)scan(Scan allow iteration over multiple rows for specified attributes)
4)Delete(Delete removes a row from a table. Deletes are executed via HTable.delete)
HBase does not modify data in place, and so deletes are handled by creating new markers called tombstones. These tombstones, along with the dead values, are cleaned up on major compaction.


A {row, column, version} tuple exactly specifies a cell in HBase. It’s possible to have an unbounded number of cells where the row and column are the same but the cell address differs only in its version dimension.While rows and column keys are expressed as bytes, the version is specified using a long integer.

The maximum number of versions to store for a given column is part of the column schema and is specified at table creation, or via an alter command.

Speak Your Mind