I am new to HDFS and MapReduce and trying to calculate survey statistics. Input file is in this format: Age Points Sex Category - all 4 of them are numbers. Is this the correct start:
public static class MapClass extends MapReduceBase
implements Mapper<IntWritable, IntWritable, IntWritable, IntWritable> {
private final static IntWritable Age = new IntWritable(1) ;
private IntWritable AgeCount = new IntWritable() ;
public void map( Text key, Text value,
OutputCollector<IntWritable, IntWritable> output,
Reporter reporter) throws IOException {
AgeCount. set(Integer. parseInt(value. toString() ) ) ;
output. collect(AgeCount, Age) ;
}
}
My questions: 1. Is this a correct start? 2. If I want to collect for other attributes like Sex,Points - will I just add another output.collect statements? I know I have to read the line and split into attributes. 3. Wher开发者_StackOverflow中文版e it says implements Mapper - I made all 4 IntWritable is it correct?
The Mapper interface expects 4 type parameters in the following order: Map input key, Map input value, Map output key and Map output value. In your case, since you are dealing with 4 integers of which 3 constitute your value and 1 your key, you are wrong to be using IntWritable as your Map input key and should be using Text instead. Also, the types you specify in your MapClass definition do not match the types you pass to your Map function. Given that you are dealing with text files, your MapClass should be defined as follows:
public static class MapClass extends MapReduceBase implements Mapper<LongWritable, Text, IntWritable, IntWritable>
In essence, you are assuming one input line of text file per map call which you will be parsing into the fields you want and casting them to ints within the map function. So, your map function would then have the following definition:
public void map(LongWritable key, Text value, OutputCollector<IntWritable, IntWritable> output, Reporter reporter) throws IOException {...}
精彩评论