Multiple volume & limit disk usage with Hadoop_问答_开发者

Multiple volume & limit disk usage with Hadoop

开发者 https://www.devze.com 2023-03-30 04:46 出处：网络

I am using Hadoop to processing on large set of data. I set up a hadoop node to use multiple volumes : one of these volume is a NAS with 10To disk, and the other one is the local disk from server with

相关专题：hive

The problem is, if I understood, that data-nodes will attempt to place equal amount of data in each volumes. Thus when I run a job on a large set of data the disk with 400 GB is quickly full, while the 10 To disk got enough space remained. Then my map-reduce program produce by Hive freeze because my cluster turn on the safe mode...

I tried to set the property for limit Data node's disk usage, but it does nothing : I have still the same problem. Hope that someone could help me.

Well it seems that my mapreduce program turn on safe mode because :

The ratio of reported blocks 0.0000 has not reached the threshold 0.9990.开发者_C百科

I saw that error on the namenode web interface. I want to disable this option with the property dfs.safemode.threshold.pct but I do not know if it is a good way to solve it?

I think you can turn to dfs.datanode.fsdataset.volume.choosing.policy for help.

<property><name>dfs.datanode.fsdataset.volume.choosing.policy</name><value>org.apache.hadoop.hdfs.server.datanode.fsdataset.AvailableSpaceVolumeChoosingPolicy</value>

Use the dfs.datanode.du.reserved configuration setting in $HADOOP_HOME/conf/hdfs-site.xml for limiting disk usage.

Reference

<property> 
    <name>dfs.datanode.du.reserved</name> 
    <!-- cluster variant --> 
    <value>182400</value> 
    <description>Reserved space in bytes per volume. Always leave this much space free for non dfs use. 
  </description> 
  </property>