HBase stores the data in the table lexicographically sorted by their row key. In a large cluster of HBase having many shards aka Regions, How the clients decides the right region to Write or Read the data?
Unlike many other NoSQL database the read/write request from the client does not goes to a component to route/fan-out to the right shard, instead the client are intelligent to route the request directly. It avoids the load on the single component to route request from all the clients.
The following diagram shows all the steps and the explanation follows:
Let's first see a simple Java program that writes the data in HBase using PUT method in 'TEST' table.
public class PutTest {
private static Configuration conf = HBaseConfiguration.create();
public static void addRecord(String tableName, String rowKey, String family, String qual, String value) throws Exception {
try {
HTable table = new HTable(conf, tableName);
Put put = new Put(Bytes.toBytes(rowKey));
put.add(Bytes.toBytes(family), Bytes.toBytes(qual), Bytes.toBytes(value));
table.put(put);
System.out.println("insert record "+rowKey+" to table "+tableName+" ok.");
} catch (IOException e) {
e.printStackTrace();
}
}
public static void main(String[] args) throws Exception{
String tablename = args[0];
PutTest.addRecord(tablename, "row-1", "data", "name", "Joe");
PutTest.addRecord(tablename, "row-1", "data", "age", "28");
}
}
The HTable class uses the following steps to find the region of "TEST" table and directly invokes the read/write operation.
- Step-1 to ZooKeeper: The client first connects to the ZooKeeper quorums to confirm that the HBase Master server is running and finds the location of the Region Server having the top level catalog table -ROOT- region in path /hbase/root-region-server.
- Step-2 to Root Region: Next client connects to the Region Server having -ROOT- region and fetches the location details of next level catalog table called .META.
- Step-3 to Meta Region: Again from the .META. table it fetches the details of the user table region having the row key required to read/write the data.
- Step-4 to user Table Region: Finally the client directly connects to the target region server holding the region of the user table "TEST" and performs the operations e.g. Get/Put/Delete etc. In case of Put operation, this region server stores the request in the MemStore and later flushed to disk.
The console output of the above program shows the following and confirms the connection with ZooKeeper, -ROOT- region server, .META. region server with Red, Blue and Green color.
12/07/17 07:28:42 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.3-1240972, built on 02/06/2012 10:48 GMT
12/07/17 07:28:42 INFO zookeeper.ZooKeeper: Client environment:host.name=bd-master-1
12/07/17 07:28:42 INFO zookeeper.ZooKeeper: Client environment:java.version=1.6.0_31
...
...
12/07/17 07:28:42 INFO zookeeper.ZooKeeper: Client environment:user.name=hadoop
12/07/17 07:28:42 INFO zookeeper.ZooKeeper: Client environment:user.home=/home/hadoop
12/07/17 07:28:42 INFO zookeeper.ZooKeeper: Client environment:user.dir=/home/hadoop/agg
12/07/17 07:28:42 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=bd-slave-1:2181,bd-master-1:2181,bd-slave-2:2181 sessionTimeout=600000 watcher=hconnection
12/07/17 07:28:42 INFO zookeeper.ClientCnxn: Opening socket connection to server /192.168.10.12:2181(bd-slave-2)
12/07/17 07:28:42 INFO zookeeper.RecoverableZooKeeper: The identifier of this process is 11564@bd-master-1
12/07/17 07:28:42 INFO client.ZooKeeperSaslClient: Client will not SASL-authenticate because the default JAAS configuration section 'Client' could not be found. If you are not using SASL, you may ignore this. On the other hand, if you expected SASL to work, please fix your JAAS configuration.
12/07/17 07:28:42 INFO zookeeper.ClientCnxn: Socket connection established to bd-slave-2/192.168.10.12:2181, initiating session
12/07/17 07:28:42 INFO zookeeper.ClientCnxn: Session establishment complete on server bd-slave-2/192.168.10.12:2181, sessionid = 0x2388f9e3f5a0015, negotiated timeout = 600000
12/07/17 07:28:42 DEBUG client.HConnectionManager$HConnectionImplementation: Lookedup root region location, connection=org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@4cb9e45a; serverName=bd-slave-4,60020,1342439310156
12/07/17 07:28:42 DEBUG client.HConnectionManager$HConnectionImplementation: Cached location for .META.,,1.1028785192 is bd-slave-8:60020
12/07/17 07:28:42 DEBUG client.MetaScanner: Scanning .META. starting at row=TEST,,00000000000000 for max=10 rows using org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@4cb9e45a
12/07/17 07:28:42 DEBUG client.HConnectionManager$HConnectionImplementation: Cached location for TEST,,1342429701327.f1bd3b3d2d05bd54d8600dd095a25bfa. is bd-slave-6:60020
insert recored row-1 to table TEST ok.
insert recored row-1 to table TEST ok.
TCP/IP Packets
Also I captured the tcp/ip packets to confirms the flow of the HBase clients with the above components:
The following shows that the client is checking the location of the -ROOT- region with ZooKeeper quorums for path /hbase/root-region-server.
Step#2 to search the .META. region in -ROOT- catalog table.
0000 68 b5 99 bd 94 c8 00 25 b3 e2 bd 6c 08 00 45 00 h......% ...l..E.
0010 00 9f 91 05 40 00 40 06 13 92 c0 a8 0a 63 c0 a8 ....@.@. .....c..
0020 0a 0e c9 8b ea 74 5e 0a 1a 20 1d 04 bf d7 80 18 .....t^. . ......
0030 00 2e 96 53 00 00 01 01 08 0a b3 cd 88 46 1c 47 ...S.... .....F.G
0040 bb af 00 00 00 67 00 00 00 02 01 00 13 67 65 74 .....g.. .....get
0050 43 6c 6f 73 65 73 74 52 6f 77 42 65 66 6f 72 65 ClosestR owBefore
0060 00 00 00 00 00 00 00 1d 03 43 4e fa 00 00 00 03 ........ .CN.....
0070 0b 09 2d 52 4f 4f 54 2d 2c 2c 30 0b 2a 2e 4d 45 ..-ROOT- ,,0.*.ME
0080 54 41 2e 2c 54 45 53 54 2c 2c 39 39 39 39 39 39 TA.,TEST ,,999999
0090 39 39 39 39 39 39 39 39 2c 39 39 39 39 39 39 39 99999999 ,9999999
00a0 39 39 39 39 39 39 39 0b 04 69 6e 66 6f 9999999. .info
Step#3 to search the user table 'TEST' in .META.
0000 d8 d3 85 ba 94 d4 00 25 b3 e2 bd 6c 08 00 45 00 .......% ...l..E.
0010 00 ae d9 75 40 00 40 06 cb 0e c0 a8 0a 63 c0 a8 ...u@.@. .....c..
0020 0a 12 9d a0 ea 74 5d 87 52 91 bc e1 4d 8a 80 18 .....t]. R...M...
0030 00 36 96 66 00 00 01 01 08 0a b3 cd 88 63 1f 50 .6.f.... .....c.P
0040 c6 b3 00 00 00 76 00 00 00 06 01 00 0b 6f 70 65 .....v.. .....ope
0050 6e 53 63 61 6e 6e 65 72 00 00 00 00 00 00 00 1d nScanner ........
0060 03 43 4e fa 00 00 00 02 0b 09 2e 4d 45 54 41 2e .CN..... ...META.
0070 2c 2c 31 27 27 02 14 54 45 53 54 2c 2c 30 30 30 ,,1''..T EST,,000
0080 30 30 30 30 30 30 30 30 30 30 30 00 00 00 00 01 00000000 000.....
0090 ff ff ff ff ff ff ff ff 01 00 00 00 00 00 00 00 ........ ........
00a0 00 00 7f ff ff ff ff ff ff ff 01 00 00 00 01 04 ........ ........
00b0 69 6e 66 6f 00 00 00 00 00 00 00 00 info.... ....
And finally the Put method sends the information with user name 'Joe'.
0000 68 b5 99 bd 16 78 00 25 b3 e2 bd 6c 08 00 45 00 h....x.% ...l..E.
0010 00 f4 11 a9 40 00 40 06 92 97 c0 a8 0a 63 c0 a8 ....@.@. .....c..
0020 0a 10 e0 be ea 74 5e 42 9a cc c7 43 dc 9b 80 18 .....t^B ...C....
0030 00 2e 96 aa 00 00 01 01 08 0a b3 cd 88 6c 1f 50 ........ .....l.P
0040 2c 95 00 00 00 bc 00 00 00 0b 01 00 05 6d 75 6c ,....... .....mul
0050 74 69 00 00 00 00 00 00 00 1d 03 43 4e fa 00 00 ti...... ...CN...
0060 00 01 42 42 00 00 00 01 35 54 45 53 54 2c 2c 31 ..BB.... 5TEST,,1
0070 33 34 32 34 32 39 37 30 31 33 32 37 2e 66 31 62 34242970 1327.f1b
0080 64 33 62 33 64 32 64 30 35 62 64 35 34 64 38 36 d3b3d2d0 5bd54d86
0090 30 30 64 64 30 39 35 61 32 35 62 66 61 2e 00 00 00dd095a 25bfa...
00a0 00 01 41 41 40 23 02 05 72 6f 77 2d 31 7f ff ff ..AA@#.. row-1...
00b0 ff ff ff ff ff ff ff ff ff ff ff ff ff 01 00 00 ........ ........
00c0 00 01 04 64 61 74 61 00 00 00 01 00 00 00 24 00 ...data. ......$.
00d0 00 00 24 00 00 00 19 00 00 00 03 00 05 72 6f 77 ..$..... .....row
00e0 2d 31 04 64 61 74 61 6e 61 6d 65 7f ff ff ff ff -1.datan ame.....
00f0 ff ff ff 04 4a 6f 65 00 00 00 00 00 00 00 00 0e ....Joe. ........
0100 11 0e ..
We seen that there are 4 steps for the clients to connect to the actual region and read/write the data. Does it have some performance impact on the client? If yes, can we avoid it? Watch the next blog to get the answers...
great post Prafull
ReplyDeleteGood one, Helps to understand.
ReplyDeleteThis comment has been removed by the author.
ReplyDeleteprafull,
ReplyDeleteYou have mentioned that the information(ZooKeeper/-ROOT-/.META. and User Table) is cached at the client-end and these steps happens only for the first time.
What will happen when the same client queries for a same table?
--will it go straightly to .meta table or the region?
What will happen when the same client queries for a different table?
--will it go straightly to .meta table for look-up of region?
Guys,
ReplyDeleteThis post is very old but I have one question.
What will happen if HBASE master goes down.
Who is responsible for locating root region server ? Why HBase cluster functions even if master goes down.