I am working on a HBase map reduce job and need to understand if the columns in a single column family are returned sorted by their names (key). If so, I wouldnt n开发者_如何学运维eed to do it in the shuffle sort stage.
Thanks
I have a very similar data model as you. Upon insertion however, I set my own values for the timestamps on the Put object. However, I did so in a way that took a "seed" of the current time and appended a incrementing counter for each event I persisted in the batch.
When I pulled the results out from the Scan, I wrote a comparator:
public class KVTimestampComparator implements Comparator<KeyValue> {
@Override
public int compare(KeyValue kv1, KeyValue kv2) {
Long kv1Timestamp = kv1.getTimestamp();
Long kv2Timestamp = kv2.getTimestamp();
return kv1Timestamp.compareTo(kv2Timestamp);
}
}
Then sorted the raw row:
List<KeyValue> row = Arrays.asList(result.raw());
Collections.sort(row, new KVTimestampComparator());
Got this idea from person who answered this : Sorted results from hbase scanner
no, columns are not sorted They are stored internally as key-value pairs in a long byte array. But, you should clarify your question about what you actually need this for.
精彩评论