开发者

Hadoop Input files Order

开发者 https://www.devze.com 2023-02-06 20:23 出处:网络
I have data files arranged in folders named as dates. Directory structure /data/2011/01/01 /data/2011/01/02

I have data files arranged in folders named as dates. Directory structure

  • /data/2011/01/01
  • /data/2011/01/02

and so on and inside each directory there are around 50 files I need to parsed and I a开发者_Go百科m giving input to hadoop as /data/** /** /** so that It can parse all the files. My questions are

  1. How can I ask hadoop to order the input. I need to parse the files date by date.
  2. While parsing files of particular date, I need to pre load a datastructure associated with that date and is in the same date directory.

Thanks Ankush


  1. You can't order the input. In a "worst case" scenario if you have the same number of input files as you have running tasks in a cluster they will all be processed at the same moment in parallel.
  2. Perhaps you can create a custom implementation of "FileInputFormat" that reads the required config file and does what you need?
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号