开发者

Hadoop streaming grep does not work

开发者 https://www.devze.com 2023-01-17 05:28 出处:网络
Grep seems not to be working for hadoop streaming For: hadoop jar /usr/local/hadoop-0.20.2/contrib/streaming/hadoop-0.20.2-streaming.jar -input /user/root/tmp2/user.data -output /user/root/selected_d

Grep seems not to be working for hadoop streaming

For: hadoop jar /usr/local/hadoop-0.20.2/contrib/streaming/hadoop-0.20.2-streaming.jar -input /user/root/tmp2/user.data -output /user/root/selected_data -mapper '/bin/grep 1938678460' -reducer 'wc' -jobconf mapred.output.compress=false

I get: java.lang.RuntimeException: PipeMapRed.wa开发者_如何学运维itOutputThreads(): subprocess failed with code 1 at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:311) at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:545) at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:132) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57) at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at org.apache.hadoop.mapred.Child.main(Child.java:17

Any idea?

I also tried: -mapper 'cat' -reducer '/bin/grep 1938678460' (cat works, grep does not)

....I also checked on all machines that /bin/grep is there and it is

Grep does not work , or I'm missing something?


I haven't tried this myself, but grep exits with a non-zero exit code if it didn't find something. If a map doesn't contain the string you grep for, you get a non-zero exit code and hadoop will error. Maybe something like "/bin/grep || true" works.

0

精彩评论

暂无评论...
验证码 换一张
取 消