the value ouput from my map/reduce is a bytewritable array, which is written in the output file part-00000 (hadoop do so by default). i need this array for my next map function so i wanted to keep this array in distributed cache. can sombody tell how can i read from outputfile (part-00000) wh开发者_StackOverflowich may not be a text file and store in distributed cache.
My suggestion:
Create a new Hadoop job with the following properties:
- Input the directory with all the part-... files.
- Create a custom OutputFormat class that writes to your distributed cache.
Now make your job to look essentially to have something like this:
conf.setInputFormat(SequenceFileInputFormat.class); conf.setMapperClass(IdentityMapper.class); conf.setReducerClass(IdentityReducer.class); conf.setOutputFormat(DistributedCacheOutputFormat.class);
Have a look at the Yahoo Hadoop tutorial because it has some examples on this point: http://developer.yahoo.com/hadoop/tutorial/module5.html#outputformat
HTH
精彩评论