HBase provides the mechanism to pre-create the regions for a table. It helps to distribute the data across the regions in the HBase cluster. Please see the following link for details of pre-creating the regions in HBase.
http://hbase.apache.org/book/perf.writing.html
The Row Key design is the magic to distribute the data across the regions. The Row key could be composite key. e.g. "User ID + Transaction ID". The following blogs provides the details:
http://riteshadval.blogspot.in/2012/03/hbase-composite-row-key-design-doing.html
So in that case the data of the specific users would be in the specific regions. In some scenarios we might need to delete all the data for these users, maybe when the users are not active.
An following method can be used to delete and recreate the complete region in HBase. Please be careful and ensure that you are passing the right regionStartKey, as all the data in that region would be deleted without a need of major compaction.
The following steps needs to be followed:
1. Disable the table
disable 'TEST'
2. Check the HFile on the HDFS and you see 2 files in region 9c52cac6d90c7a43d98887aed68179a7.
3. Execute the above program for a table with start key for this region
4. The following snapshot shows that both the HFile of region 9c52cac6d90c7a43d98887aed68179a7 are deleted
5. Enable the table again
enable 'TEST'
After enabling the table in HBase, all the data in that region is deleted and you wont see the data of the region if you scan the table. The benefit of this approach is that there is no need of major compaction to delete the huge amount of data in that region.
http://hbase.apache.org/book/perf.writing.html
The Row Key design is the magic to distribute the data across the regions. The Row key could be composite key. e.g. "User ID + Transaction ID". The following blogs provides the details:
http://riteshadval.blogspot.in/2012/03/hbase-composite-row-key-design-doing.html
So in that case the data of the specific users would be in the specific regions. In some scenarios we might need to delete all the data for these users, maybe when the users are not active.
An following method can be used to delete and recreate the complete region in HBase. Please be careful and ensure that you are passing the right regionStartKey, as all the data in that region would be deleted without a need of major compaction.
private void recreateRegion(String tableName, String regionStartKey) {
try {
Configuration conf = HBaseConfiguration.create();
HTable hTable = new HTable(conf, tableName);
HTableDescriptor desc = hTable.getTableDescriptor();
byte[][] startKeys = hTable.getStartKeys();
for (int i = 0; i < startKeys.length; i++) {
byte[] startKey = startKeys[i];
if (Bytes.toString(startKey).equals(regionStartKey)) {
FileSystem fs = FileSystem.get(conf);
Path rootDir = new Path(conf.get("hbase.rootdir"));
HRegionInfo info = hTable.getRegionLocation(startKey)
.getRegionInfo();
System.out.println("deleting region - " + info.toString());
HRegion.deleteRegion(fs, rootDir, info);
System.out.println("creating region - " + info.toString());
HRegion newRegion = HRegion.createHRegion(info, rootDir,
conf, desc);
newRegion.close();
break;
}
}
hTable.close();
} catch (Exception e) {
e.printStackTrace();
}
}
The following steps needs to be followed:
1. Disable the table
disable 'TEST'
2. Check the HFile on the HDFS and you see 2 files in region 9c52cac6d90c7a43d98887aed68179a7.
3. Execute the above program for a table with start key for this region
4. The following snapshot shows that both the HFile of region 9c52cac6d90c7a43d98887aed68179a7 are deleted
5. Enable the table again
enable 'TEST'
After enabling the table in HBase, all the data in that region is deleted and you wont see the data of the region if you scan the table. The benefit of this approach is that there is no need of major compaction to delete the huge amount of data in that region.
No comments:
Post a Comment