开发者

Hadoop WordCount Example Problem, Do I need to do some performance tuning?

开发者 https://www.devze.com 2023-03-26 06:28 出处:网络
I\'m a newbie for Hadoop. Recently I just make an implementation of WordCount example. But when I run this programs on my single node with 2 input files , just 9 word, it cost nearly 33 second to do

I'm a newbie for Hadoop.

Recently I just make an implementation of WordCount example.

But when I run this programs on my single node with 2 input files , just 9 word, it cost nearly 33 second to do such !!! so crazy, and it makes me so confusing !!!

Can any one tell me is this normal or some???

How can I fix this problem? Remember, I just create 2 input files with 9 word in it.

Submit Host Address: 127.0.0.1

Job-ACLs: All users are allowed

Job Setup: 开发者_如何学运维Successful

Status: Succeeded

Started at: Fri Aug 05 14:27:22 CST 2011

Finished at: Fri Aug 05 14:27:53 CST 2011

Finished in: 30sec


Hadoop is not efficient for very very small jobs, as it takes more time for the JVM Startup, process initialization and others. Though, it can be optimized to some extent by enabling JVM reuse.

http://hadoop.apache.org/common/docs/r0.20.2/mapred_tutorial.html#Task+JVM+Reuse

Also, there is some work going on this in Apache Hadoop

https://issues.apache.org/jira/browse/MAPREDUCE-1220

Not sure in which release this will be included or what the state of the JIRA is.


This is not unusual. Hadoop comes into effect with large datasets. What you are seeing is probably the initial startup time of Hadoop.

0

精彩评论

暂无评论...
验证码 换一张
取 消