HBase column wide scanning and fetching_问答_开发者

开发者 https://www.devze.com 2023-02-06 07:28 出处：网络

Let\'s say i\'ve created 开发者_运维百科a table rowkey (attrId+attr_value) //compound key column => doc:doc1, doc:doc2, ...

相关专题：hbase

rowkey (attrId+attr_value) //compound key

column => doc:doc1, doc:doc2, ...

when use scan feature, i would fetch 1 row every time inside iterator, what if the column qualifier reach millions entries. how do you loop through that, and will there be a cache issue?

thanks.

Scans fetch rows. You can qualify a scan so that it only fetches given qualifiers or given families, but then that is all that will be returned from the scan (and you can only filter on data that is included in a scan).

If you have potentially millions of columns in a single row, that could be an issue: that means that returning that row could be a very large network transfer. If your row size exceeds your region size it could also cause OOM errors on your region servers, and you will have inefficient storage (one row per region).

However, ignoring all of that, you can loop through the columns and column qualifiers in the client.You can get a Map from the result set that maps from families to qualifiers to values. But that is probably not what you really want to do

You can workaround giant row fetches with a mixture of scans and column filters:

Scan s = ...;
s.setStartRow("some-row-key");
s.setStopRow("some-row-key");
Filter f = new ColumnRangeFilter(Bytes.toBytes("doc0000"), true,
                                 Bytes.toBytes("doc0100"), false);
s.setFilter(f);

Source: http://hadoop-hbase.blogspot.com/2012/01/hbase-intra-row-scanning.html

You can also limit the number of columns within a row returned at a time via Scan.setBatch.