I've installed mahout at bitnami AMI ami-02fb006b, (as well as several other ami's, otherwise I won't be asking the question) according to instructions provided here and here:
I'm always getting stuck when trying to run ./examples/bin/build-reuters.sh Here's the output of the command:
> Please select a number to choose the corresponding clustering
> algorithm
> 1. kmeans clustering
> 2. lda clustering Enter your choice : 1 ok. You chose 1 and we'll use
> kmeans Clustering Downloading Reuters-21578 % Total % Received %
> Xferd Average Speed Time Time Time Current
> Dload Upload Total Spent
> Left Speed 100 7959k 100 7959k 0 0 294k 0 0:00:26
> 0:00:26 --:--:-- 305k Extracting... Running on hadoop, using
> HADOOP_HOME=/usr/local/hadoop-0.20.2
> HADOOP_CONF_DIR=/usr/local/hadoop-0.20.2/conf MAHOUT-JOB:
> /usr/local/mahout-0.4/examples/target/mahout-examples-0.6-SNAPSHOT-job.jar
> 11/08/16 20:10:25 WARN driver.MahoutDriver: No
> org.apache.lucene.benchmark.utils.ExtractReuters.props found on
> classpath, will use command-line arguments only
> Deleting all files in mahout-work/reuters-out-tmp
> 11/08/16 20:10:30 INFO driver.MahoutDriver: Program took 4906 ms
> MAHOUT_LOCAL is set, running locally
> CLASSPATH:
> :/usr/local/mahout-0.4/src/conf:/usr/local/hadoop-0.20.2/conf:/usr/lib/jvm/java-6-openjdk//lib/tools.jar:/usr/local/mahout-0.4/mahout-*.jar:/usr/local/mahout-0.4/examples/target/mahout-examples-0.6-SNAPSHOT-job.jar:/usr/local/mahout-0.4/mahout-examples-*-job.jar:/usr/local/mahout-0.4/lib/*.jar:/usr/local/mahout-0.4/examples/target/dependency/antlr-2.7.7.jar:/usr/local/mahout-0.4/examples/target/dependency/antlr-3.2.jar:/usr/local/mahout-0.4/examples/target/dependency/antlr-runtime-3.2.jar:/usr/local/mahout-0.4/examples/target/dependency/avro-1.4.0-cassandra-1.jar:/usr/local/mahout-0.4/examples/target/dependency/bson-2.5.jar:/usr/local/mahout-0.4/examples/target/dependency/cassandra-all-0.8.1.jar:/usr/local/mahout-0.4/examples/target/dependency/cassandra-thrift-0.8.1.jar:/usr/local/mahout-0.4/examples/target/dependency/cglib-nodep-2.2.jar:/usr/local/mahout-0.4/examples/target/dependency/commons-beanutils-1.7.0.jar:/usr/local/mahout-0.4/examples/target/dependency/commons-beanutils-core-1.8.0.jar:/usr/local/mahout-0.4/examples/target/dependency/commons-cli-1.2.jar:/usr/local/mahout-0.4/examples/target/dependency/commons-cli-2.0-mahout.jar:/usr/local/mahout-0.4/examples/target/dependency/commons-codec-1.4.jar:/usr/local/mahout-0.4/examples/target/dependency/commons-collections-3.2.1.jar:/usr/local/mahout-0.4/examples/target/dependency/commons-compress-1.1.jar:/usr/local/mahout-0.4/examples/target/dependency/commons-configuration-1.6.jar:/usr/local/mahout-0.4/examples/target/dependency/commons-dbcp-1.4.jar:/usr/local/mahout-0.4/examples/target/dependency/commons-digester-1.7.jar:/usr/local/mahout-0.4/examples/target/dependency/commons-httpclient-3.0.1.jar:/usr/local/mahout-0.4/examples/target/dependency/commons-lang-2.6.jar:/usr/local/mahout-0.4/examples/target/dependency/commons-logging-1.1.1.jar:/usr/local/mahout-0.4/examples/target/dependency/commons-math-2.2.jar:/usr/local/mahout-0.4/examples/target/dependency/commons-pool-1.5.6.jar:/usr/local/mahout-0.4/examples/target/dependency/concurrentlinkedhashmap-lru-1.1.jar:/usr/local/mahout-0.4/examples/target/dependency/easymock-3.0.jar:/usr/local/mahout-0.4/examples/target/dependency/google-collections-1.0-rc2.jar:/usr/loc开发者_Python百科al/mahout-0.4/examples/target/dependency/guava-r09.jar:/usr/local/mahout-0.4/examples/target/dependency/hadoop-core-0.20.203.0.jar:/usr/local/mahout-0.4/examples/target/dependency/hector-core-0.8.0-2.jar:/usr/local/mahout-0.4/examples/target/dependency/high-scale-lib-1.1.2.jar:/usr/local/mahout-0.4/examples/target/dependency/httpclient-4.0.1.jar:/usr/local/mahout-0.4/examples/target/dependency/httpcore-4.0.1.jar:/usr/local/mahout-0.4/examples/target/dependency/jackson-core-asl-1.8.2.jar:/usr/local/mahout-0.4/examples/target/dependency/jackson-mapper-asl-1.8.2.jar:/usr/local/mahout-0.4/examples/target/dependency/jakarta-regexp-1.4.jar:/usr/local/mahout-0.4/examples/target/dependency/jamm-0.2.2.jar:/usr/local/mahout-0.4/examples/target/dependency/jcommon-1.0.12.jar:/usr/local/mahout-0.4/examples/target/dependency/jetty-6.1.22.jar:/usr/local/mahout-0.4/examples/target/dependency/jetty-util-6.1.22.jar:/usr/local/mahout-0.4/examples/target/dependency/jfreechart-1.0.13.jar:/usr/local/mahout-0.4/examples/target/dependency/jline-0.9.94.jar:/usr/local/mahout-0.4/examples/target/dependency/json-simple-1.1.jar:/usr/local/mahout-0.4/examples/target/dependency/jul-to-slf4j-1.6.1.jar:/usr/local/mahout-0.4/examples/target/dependency/junit-4.8.2.jar:/usr/local/mahout-0.4/examples/target/dependency/libthrift-0.6.1.jar:/usr/local/mahout-0.4/examples/target/dependency/log4j-1.2.16.jar:/usr/local/mahout-0.4/examples/target/dependency/lucene-analyzers-3.1.0.jar:/usr/local/mahout-0.4/examples/target/dependency/lucene-benchmark-3.1.0.jar:/usr/local/mahout-0.4/examples/target/dependency/lucene-core-3.1.0.jar:/usr/local/mahout-0.4/examples/target/dependency/lucene-highlighter-3.1.0.jar:/usr/local/mahout-0.4/examples/target/dependency/lucene-memory-3.1.0.jar:/usr/local/mahout-0.4/examples/target/dependency/lucene-queries-3.1.0.jar:/usr/local/mahout-0.4/examples/target/dependency/lucene-xercesImpl-3.1.0.jar:/usr/local/mahout-0.4/examples/target/dependency/mahout-collections-1.0.jar:/usr/local/mahout-0.4/examples/target/dependency/mahout-core-0.6-SNAPSHOT.jar:/usr/local/mahout-0.4/examples/target/dependency/mahout-core-0.6-SNAPSHOT-tests.jar:/usr/local/mahout-0.4/examples/target/dependency/mahout-integration-0.6-SNAPSHOT.jar:/usr/local/mahout-0.4/examples/target/dependency/mahout-math-0.6-SNAPSHOT.jar:/usr/local/mahout-0.4/examples/target/dependency/mahout-math-0.6-SNAPSHOT-tests.jar:/usr/local/mahout-0.4/examples/target/dependency/mongo-java-driver-2.5.jar:/usr/local/mahout-0.4/examples/target/dependency/objenesis-1.2.jar:/usr/local/mahout-0.4/examples/target/dependency/servlet-api-2.5-20081211.jar:/usr/local/mahout-0.4/examples/target/dependency/servlet-api-2.5.jar:/usr/local/mahout-0.4/examples/target/dependency/slf4j-api-1.6.1.jar:/usr/local/mahout-0.4/examples/target/dependency/slf4j-jcl-1.6.1.jar:/usr/local/mahout-0.4/examples/target/dependency/slf4j-log4j12-1.6.1.jar:/usr/local/mahout-0.4/examples/target/dependency/snakeyaml-1.6.jar:/usr/local/mahout-0.4/examples/target/dependency/solr-commons-csv-3.1.0.jar:/usr/local/mahout-0.4/examples/target/dependency/speed4j-0.9.jar:/usr/local/mahout-0.4/examples/target/dependency/stringtemplate-3.2.jar:/usr/local/mahout-0.4/examples/target/dependency/uncommons-maths-1.2.2.jar:/usr/local/mahout-0.4/examples/target/dependency/uuid-3.2.0.jar:/usr/local/mahout-0.4/examples/target/dependency/watchmaker-framework-0.6.2.jar:/usr/local/mahout-0.4/examples/target/dependency/watchmaker-swing-0.6.2.jar:/usr/local/mahout-0.4/examples/target/dependency/xml-apis-1.0.b2.jar:/usr/local/mahout-0.4/examples/target/dependency/xpp3_min-1.1.4c.jar:/usr/local/mahout-0.4/examples/target/dependency/xstream-1.3.1.jar
> SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found
> binding in
> [jar:file:/usr/local/mahout-0.4/examples/target/mahout-examples-0.6-SNAPSHOT-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in
> [jar:file:/usr/local/mahout-0.4/examples/target/dependency/slf4j-jcl-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in
> [jar:file:/usr/local/mahout-0.4/examples/target/dependency/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
> explanation. WARNING: org.apache.hadoop.metrics.jvm.EventCounter is
> deprecated. Please use org.apache.hadoop.log.metrics.EventCounter in
> all the log4j.properties files. 11/08/16 20:10:32 INFO
> common.AbstractJob: Command line arguments: {--charset=UTF-8,
> --chunkSize=5, --endPhase=2147483647,
> --fileFilterClass=org.apache.mahout.text.PrefixAdditionFilter,
> --input=mahout-work/reuters-out, --keyPrefix=,
> --output=mahout-work/reuters-out-seqdir, --startPhase=0,
> --tempDir=temp} Exception in thread "main" java.io.IOException: Call
> to localhost/127.0.0.1:9000 failed on local exception:
> java.io.IOException: Broken pipe
> at
> org.apache.hadoop.ipc.Client.wrapException(Client.java:1065)
> at org.apache.hadoop.ipc.Client.call(Client.java:1033)
> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:224)
> at $Proxy1.getProtocolVersion(Unknown Source)
> at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:364)
> at
> org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:106)
> at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:208)
> at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:175)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)
> at
> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1310)
> at
> org.apache.hadoop.fs.FileSystem.access$100(FileSystem.java:65)
> at
> org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1328)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:226)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:109)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:210)
> at
> org.apache.mahout.text.SequenceFilesFromDirectory.run(SequenceFilesFromDirectory.java:59)
> at
> org.apache.mahout.text.SequenceFilesFromDirectory.run(SequenceFilesFromDirectory.java:110)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
> at
> org.apache.mahout.text.SequenceFilesFromDirectory.main(SequenceFilesFromDirectory.java:85)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:616)
> at
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
> at
> org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> at
> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)
> Caused by: java.io.IOException: Broken pipe
> at sun.nio.ch.FileDispatcher.write0(Native Method)
> at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
> at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:122)
> at sun.nio.ch.IOUtil.write(IOUtil.java:93)
> at
> sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:352)
> at
> org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:55)
> at
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
> at
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146)
> at
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107)
> at
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
> at
> java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
> at java.io.DataOutputStream.flush(DataOutputStream.java:123)
> at
> org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:746)
> at org.apache.hadoop.ipc.Client.call(Client.java:1011)
> ... 25 more rmr: cannot remove mahout-work/reuters-out-seqdir:
> No such file or directory. put: File mahout-work/reuters-out-seqdir
> does not exist.
this is a consistent error and I am getting it in every single installation I attempt.
What do I do to fix this?
This looks like an error from Hadoop and/or EC2. Hadoop workers failed writing data to each other for some reason. Why, I don't know, but I might guess that ports aren't open, even locally.
I have always used Amazon EMR directly instead.
Maybe you can debug by trying another M/R job to test. It is not related to Mahout directly as far as I can tell.
精彩评论