开发者

Mutiple maps followed by one reduce with Hadoop and HBase

开发者 https://www.devze.com 2023-02-21 06:15 出处:网络
I have several Hbase开发者_如何学运维 tables.I wish to run a map task on each table (each map being a different Mapper class since each table contains heterogeneous data) followed by one reduce.

I have several Hbase开发者_如何学运维 tables. I wish to run a map task on each table (each map being a different Mapper class since each table contains heterogeneous data) followed by one reduce.

I cannot work out if this is possible without explictly reducing the data after each map into an interim SequenceFile.

Any help would be gratefully received.


It seems you can only run an MR on one table at a time (see TableMapReduceUtil). So most probably, your best bet is as you suspected: save the output of each table into an interim location (e.g. SequenceFile or a tmp hbase table) and then write a final MR job that takes that location as an input and merges the results. Also, if each MR job outputs data in a common format, you may not even need the last MR merge job.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号