开发者

Specify Hadoop mapreduce input keys directly (not from a file)

开发者 https://www.devze.com 2023-01-27 07:08 出处:网络
I\'d like to generate some data using a mapreduce.I\'d like to invoke the job with one parameter N, and get Map called with each integer from 1 to N, once.

I'd like to generate some data using a mapreduce. I'd like to invoke the job with one parameter N, and get Map called with each integer from 1 to N, once.

Obviously I want a Mapper<IntWritable, NullWritable, <my output types>>...that's easy. But I can't figure开发者_如何学Python out how to generate the input data! Is there an InputFormat I'm not seeing somewhere that lets me just pull keys + values from a collection directly?


Do you want each mapper to process all integers from 1 to N? Or do you want to distribute the processing of integers 1 to N across the concurrently running mappers?

If the former, I believe you'll need to create a custom InputFormat. If the latter, the easiest way might be to generate a text file with integers 1 to N, each integer on one line, and use LineInputFormat.

0

精彩评论

暂无评论...
验证码 换一张
取 消