Get HBase Row Keys in Range without Retrieving Data?_问答_开发者

Get HBase Row Keys in Range without Retrieving Data?

开发者 https://www.devze.com 2023-02-26 05:43 出处：网络

Is there a way to retrieve the row keys in a given range without actually retrieving the columns/CFs associated with that row key?

相关专题：hbase

Is there a way to retrieve the row keys in a given range without actually retrieving the columns/CFs associated with that row key?

For clarification: In my example, our table's row keys are stock ticker names (e.g. GOOG), and in our web app we'd like to populate an autocomplete widget using just the row keys we have in the database. Obviously, if we retrieve all the data (instead of only the stock names) for all the stocks between G and H when a user types 'G', we'll b开发者_开发技巧e unnecessarily straining our system. Any ideas?

According to the official documentation, you can optimally retrieve only the row keys using a combination of two filters: the KeyOnlyFilter and the FirstKeyOnlyFilter. (I think the "FirstKeyOnlyFilter" will return the key only once, even with large, complex rows.) If you only want keys in a given range, you can add that range to the scanner.

Here is some example code:

FilterList filters = new FilterList(FilterList.Operator.MUST_PASS_ALL,
            new FirstKeyOnlyFilter(),
            new KeyOnlyFilter());
Scan s = new Scan(filters);
// in order to limit the scan to a range
s.setStartRow(startRowKey);  // first key in range
s.setStopRow(stopRowKey);    // key value after the last key in the range

Source: https://hbase.apache.org/book.html#perf.hbase.client.rowkeyonly

take a look at the filters (http://hbase.apache.org/book/client.filter.html), especially KeyOnlyFilter. the description of the filter (by http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/package-summary.html) is

A filter that will only return the key component of each KV (the value will be rewritten as empty).

in order to restrict the keys on a specific range use the Scan(rowStart, rowEnd) constructor.

I would create a column family called 'empty:', and store empty values for all the rows. Now, you can just just request to load the column 'empty:'. This is not ideal, but it is better than loading columns families with lot of data.

you can use addFamily(byte[] family) or addFamily(byte[] family,byte[] qualifier) to retrieve just the relevant data

One approach would be to maintain another index table which would have keys for all the possible FSA states for all the stocks. So next time whenever a user types in 'G', all you would have to do is hit this table and retrieve may be a comma separated list of all the values related to G.