I'm a newbie for Hadoop.
Recently I just make an implementation of WordCount example.
But when I run this programs on my single node with 2 input files , just 9 word, it cost nearly 33 second to do such !!! so crazy, and it makes me so confusing !!!
Can any one tell me is this normal or some???
How can I fix this problem? Remember, I just create 2 input files with 9 word in it.
Submit Host Address: 127.0.0.1
Job-ACLs: All users are allowed Job Setup: 开发者_如何学运维Successful Status: Succeeded Started at: Fri Aug 05 14:27:22 CST 2011 Finished at: Fri Aug 05 14:27:53 CST 2011 Finished in: 30sec
Hadoop is not efficient for very very small jobs, as it takes more time for the JVM Startup, process initialization and others. Though, it can be optimized to some extent by enabling JVM reuse.
http://hadoop.apache.org/common/docs/r0.20.2/mapred_tutorial.html#Task+JVM+Reuse
Also, there is some work going on this in Apache Hadoop
https://issues.apache.org/jira/browse/MAPREDUCE-1220
Not sure in which release this will be included or what the state of the JIRA is.
This is not unusual. Hadoop comes into effect with large datasets. What you are seeing is probably the initial startup time of Hadoop.
精彩评论