开发者

How to save only non empty reducers' output in HDFS

开发者 https://www.devze.com 2023-03-08 17:41 出处:网络
In my application the reducer saves all the part files in HDFS but I want only the reducer wi开发者_开发技巧ll write the part files whose sizes are not 0bytes.Please let me know how to define it.It is

In my application the reducer saves all the part files in HDFS but I want only the reducer wi开发者_开发技巧ll write the part files whose sizes are not 0bytes.Please let me know how to define it.


It is possible - see the documentation section on "Lazy Output":

http://hadoop.apache.org/mapreduce/docs/current/mapred_tutorial.html#Lazy+Output+Creation

import org.apache.hadoop.mapreduce.lib.output.LazyOutputFormat;
LazyOutputFormat.setOutputFormatClass(job, TextOutputFormat.class); 


If you're using the old API, you can use the NullOutputFormat class:

import org.apache.hadoop.mapred.lib.NullOutputFormat;
conf.setOutputFormat(NullOutputFormat.class);
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号