I am using Hadoop to processing on large set of data. I set up a hadoop node to use multiple volumes : one of these volume is a NAS with 10To disk, and the other one is the local disk from server with a storage capacity of 400 GB.
The problem is, if I understood, that data-nodes will attempt to place equal amount of data in each volumes. Thus when I run a job on a large set of data the disk with 400 GB is quickly full, while the 10 To disk got enough space remained. Then my map-reduce program produce by Hive freeze because my cluster turn on the safe mode... I tried to set the property for limit Data node's disk usage, but it does nothing : I have still the same problem. Hope that someone could help me.Well it seems that my mapreduce program turn on safe mode because :
The ratio of reported blocks 0.0000 has not reached the threshold 0.9990.开发者_C百科
I saw that error on the namenode web interface. I want to disable this option with the property dfs.safemode.threshold.pct but I do not know if it is a good way to solve it?
I think you can turn to dfs.datanode.fsdataset.volume.choosing.policy
for help.
<property><name>dfs.datanode.fsdataset.volume.choosing.policy</name><value>org.apache.hadoop.hdfs.server.datanode.fsdataset.AvailableSpaceVolumeChoosingPolicy</value>
Use the dfs.datanode.du.reserved
configuration setting in $HADOOP_HOME/conf/hdfs-site.xml
for limiting disk usage.
Reference
<property>
<name>dfs.datanode.du.reserved</name>
<!-- cluster variant -->
<value>182400</value>
<description>Reserved space in bytes per volume. Always leave this much space free for non dfs use.
</description>
</property>
精彩评论