Easiest non-Java way to write HBase MapReduce on CDH3?_问答_开发者

Easiest non-Java way to write HBase MapReduce on CDH3?

开发者 https://www.devze.com 2023-02-01 21:50 出处：网络

I\'ve been working on this for a long time, and I feel very worn out; I\'m hoping for an [obvious?] insight from SO community that might get my pet project back on the move, so I can stop kicking myse

I've been working on this for a long time, and I feel very worn out; I'm hoping for an [obvious?] insight from SO community that might get my pet project back on the move, so I can stop kicking myself. I'm using Cloudera CDH3, HBase .89 and Hadoop .20.

I have a Python/Django app that writes data to a single HBase table using the Thrift interface, and that works great. Now I want to Map/Reduce it into some more HBase tables.

The obvious answer here is either Dumbo or Apache PIG, but with Pig, the HBaseStorage adapter support isn't available for my version yet (Pig is able to load the classes and definitions, but freezes at the "Map" step, complaining about "Input Splits"; Pig mailing lists suggest this is fixed in Pig 0.8, which is incompatible with CDH3 Hadoop, so I'd have to use edge versions of everything [i think]). I can't find any information on how to make Dumbo use HBaseStorage as a data sink.

I don't care if it's Python, Ruby, Scala, Clojure, Jython, JRuby or even PHP, I ju开发者_C百科st really don't want to write Java (for lots of reasons, most of them involving the sinking feeling I get every time I have to convert an Int() to IntWritable() etc).

I've tried literally every last solution and example I can find (for the last 4 weeks) for writing HBase Map/Reduce jobs in alternative languages, but everything seems to be either outdated or incomplete. Please, Stack Overflow, save me from my own devices!

It's not precisely an answer, but it's the closest I got --

I asked in #hbase on irc.freenode.net yesterday, and one of the Cloudera employees responded. The "Input Splits" problem I'm having with Pig is specific to Pig 0.7, and Pig 0.8 will be bundled with Cloudera CDH3 Beta 4 (no ETA on that). Therefore, what I want to do (easily write M/R jobs using HBase tables as both sink and source) will be possible in their next release. It also seems that the HBaseStorage class will be generally improved to help with read/write operations from ANY JVM language, as well, making Jython, JRuby, Scala and Clojure all much more feasible as well.

So the answer to the question, at this time, is "Wait for CDH3 Beta 4", or if you're impatient, "Download the latest version of Pig and pray that it's compatible with your HBase"