I'm trying to combine multiple files in multiple input directories into a single file, for various odd reasons I won't go into. My initial try was to write a 'nul' mapper and reducer that just copied input to output, but that failed. My latest try is:
vcm_hadoop lester jar /vcm/home/apps/hadoop/contrib/streaming/hadoop-*-streaming.jar -input /cruncher/201004/08/17/00 -output /lcuffcat9 -mapper /bin/cat -reducer NONE
but I end up wi开发者_Python百科th multiple output files anyway. Anybody know how I can coax everything into a single output file?
Keep the cat mappers and use a single cat reducer. Make sure you're setting the number of reducers to one. The output will also have gone through the sorter.
You need to use a reducer because you can only suggest the number of mappers.
If you don't want the output sorted, you could have your mappers take filenames as input, read from that file, and output the filename and line number as the key and a line from the file as the value, and have the reducer throw away the key and output the value.
精彩评论