开发者

Input to the Mapper in Hadoop

开发者 https://www.devze.com 2023-01-16 09:30 出处:网络
We can provide input files to the mapper as FileInputFormat.setInputPaths(conf, inputPath); Is it possible to pass a reference to memory say a DOM tree constructed using a DOM parser

We can provide input files to the mapper as

FileInputFormat.setInputPaths(conf, inputPath);

Is it possible to pass a reference to memory say a DOM tree constructed using a DOM parser after parsing an XML file as an inp开发者_开发问答ut to mapper function of the Hadoop framework.

What other possibilities are there?


No, you can't specify memory (RAM) based information.

The reason is that in general Hadoop applications will be distributed over a lot of physically separated systems. The current version of Hadoop "only" supports distributed data using HDFS ... which is a file system.

What you can do is add the DOM parser as a preprocessing step to your mapper and simply specify your input test file as the input. You can most easily do that by creating your own derivative of FileInputFormat.

HTH

0

精彩评论

暂无评论...
验证码 换一张
取 消