I'm using the thrift interface (http://apache.mesi.com.ar//incubator/thrift/0.5.0-incubating/) to access HBase on my cluster. I can connect, get and display records; Use the start and stop dates.
The documentation (http://hbase.apache.org/docs/r0.89.20100924/apidocs/org/apache/hadoop/hbase/thrift/doc-files/Hbase.html#Fn_Hbase_scannerOpenWithStop) says,
It's also possible to pass a regex in the column qualifier.
My question is a simple - How?
My working Code:
int scannerId = client.scannerOpenWithStop("TABLE_NAME".ToByteArray(),
"START_ROW".ToByteArray(), "STOP_ROW".ToByteArray(),
new List<string>(){"COLUMN_FAMILY" }.ToByteArrayList());
The ToByteArray()
and ToByteArrayList()
are extension functions with ...List calling ToByteArray for each string, stuffs in list, ect. I'm putting it below in case my method of string->byte[] conversion can cause problems.
public static byte[] ToByteArray(this string s)
{
System.Text.UTF8Encoding encoding = new System.Text.UTF8Encoding();
return encoding.GetBytes(s);
}
When I run the above code, it returns data, varied by changes in the START_ROW
and STOP_ROW
. If I add a colon (:
) to the entry in the List to be COLUMN_FAMILY:
it still returns the data. If I put a full column name in COLUMN_FAMILY:http://www.myurl.co开发者_运维百科m/more/goes/here
it will get all values for that URL.
What I want to do is have COLUMN_FAMILY:http://www.myurl.com/.*
(or other regex) and have it return the relevant data, like the documentation seems to say it can.
An example or two is all I should need. I figure there is some formatting or trick I'm missing to get the regex working.
COLUMN_FAMILY:/(?i:^http://www.myurl.com)/
It looks like the leading and trailing / tell it that the contents should be parsed as
http://blog.hypertable.com/?cat=1
精彩评论