Hi I am trying to run Apache Nutch 1.2 on Amazon's EMR.
To do this I specifiy an input directory from S3. I get the following error:Fetcher: java.lang.IllegalArgumentException: This file system object (hdfs://ip-11-202-55-144.ec2.internal:9000) does not support access to the request path 's3n://crawlResults2/segments/20110823155002/crawl_fetch' You possibly called FileSystem.get(conf) when you should have called FileSystem.get(uri, conf) to obtain a file system supporting your path.
I understand the difference between FileSystem.get(uri, conf)
, and FileSystem.get(conf)
. If I were writing this myself I would FileSystem.get(uri, conf)
however I am trying to use existing Nutch code.
I asked this question, and someone told me that I needed to modify hadoop-site.xml
to include the following properties开发者_StackOverflow: fs.default.name
, fs.s3.awsAccessKeyId
, fs.s3.awsSecretAccessKey
. I updated these properties in core-site.xml
(hadoop-site.xml
does not exist), but that didn't make a difference. Does anyone have any other ideas?
Thanks for the help.
try to specify in
hadoop-site.xml
<property>
<name>fs.default.name</name>
<value>org.apache.hadoop.fs.s3.S3FileSystem</value>
</property>
This will mention to Nutch that by default S3 should be used
Properties
fs.s3.awsAccessKeyId and fs.s3.awsSecretAccessKey
specification you need only in case when your S3 objects are placed under authentication (In S3 object can be accessed to all users, or only by authentication)
精彩评论