HBase database is used to store the large amount of data. The size of each record can really make a huge difference in the total storage requirements for the solution. For example to store 1 billion records with just 1 extra byte needs ~1 GB extra disk space.
The following section covers the details of the record in HBase. The record consist of the Row ID, Column Family Name, Column Qualifier, Timestamp and the Value.
As HBase is the column oriented database, it stores each value with fully qualified row key. For example to store the employee data with employeeId, firstName and lastName, the fully qualified row key is repeated for each column. So what is the total disk space required to store this data?
First, the following shows the scan of the employee table with actual data:
hbase(main):011:0> scan 'employee'
ROW COLUMN+CELL
row1 column=data:employeeId, timestamp=1339960912955, value=123
row1 column=data:firstName, timestamp=1339960810908, value=Joe
row1 column=data:lastName, timestamp=1339960868771, value=Robert
HBase KeyValue format
HBase stores the data in KeyValue format. The following picture shows the details including the data types and the size required by each field:
Fig 1 : Key Value Format of HBase |
So to calculate the record size:
Fixed part needed by KeyValue format = Key Length + Value Length + Row Length + CF Length + Timestamp + Key Value = ( 4 + 4 + 2 + 1 + 8 + 1) = 20 Bytes
Variable part needed by KeyValue format = Row + Column Family + Column Qualifier + Value
Total bytes required = Fixed part + Variable part
So for the above example let's calculate the record size:
First Column = 20 + (4 + 4 + 10 + 3) = 41 Bytes
Second Column = 20 + (4 + 4 + 9 + 3) = 40 BytesThird Column = 20 + (4 + 4 + 8 + 6) = 42 Bytes
Total Size for the row1 in above example = 123 Bytes
To Store 1 billion such records the space required = 123 * 1 billion = ~ 123 GB
Please see the following snapshot of KeyValue.java for details of the fields type and size in HBase KeyValue:
static byte [] createByteArray(final byte [] row, final int roffset,
final int rlength, final byte [] family, final int foffset, int flength,
final byte [] qualifier, final int qoffset, int qlength,
final long timestamp, final Type type,
final byte [] value, final int voffset, int vlength) {
..
..
// Allocate right-sized byte array.
byte [] bytes = new byte[KEYVALUE_INFRASTRUCTURE_SIZE + keylength + vlength];
// Write key, value and key row length.
int pos = 0;
pos = Bytes.putInt(bytes, pos, keylength);
pos = Bytes.putInt(bytes, pos, vlength);
pos = Bytes.putShort(bytes, pos, (short)(rlength & 0x0000ffff));
pos = Bytes.putBytes(bytes, pos, row, roffset, rlength);
pos = Bytes.putByte(bytes, pos, (byte)(flength & 0x0000ff));
if(flength != 0) {
pos = Bytes.putBytes(bytes, pos, family, foffset, flength);
}
if(qlength != 0) {
pos = Bytes.putBytes(bytes, pos, qualifier, qoffset, qlength);
}
pos = Bytes.putLong(bytes, pos, timestamp);
pos = Bytes.putByte(bytes, pos, type.getCode());
if (value != null && value.length > 0) {
pos = Bytes.putBytes(bytes, pos, value, voffset, vlength);
}
return bytes;
}
SQIAR (http://www.sqiar.com/solutions... is a leading Business Intelligence company and provides Tableau Software consultancy in United Kingdom and USA.
ReplyDeletethanks was useful :-)
ReplyDeleteThis comment has been removed by the author.
ReplyDeleteThank you so much for this nice information. Hope so many people will get aware of this and useful as well. And please keep update like this.
ReplyDeleteBig Data Consulting Services
Data Lake Solutions
Advanced Analytics Services
Full Stack Development Solutions
Sands Casino in Palm Springs, CA - Las Vegas Sun
ReplyDeleteSands Casino is located in the heart of Palm Springs in the heart of 인카지노 the South of the US. Located on the famous Red River septcasino Casino. Sands Casino 바카라 사이트