开发者

Change block size of dfs file

开发者 https://www.devze.com 2022-12-27 16:01 出处:网络
My map is currently inefficient when parsing one particular set of files (a total of 2 TB). I\'d like to change the block size of files in the Hadoop dfs (from 64MB to 128 MB). I can\'t find how to do

My map is currently inefficient when parsing one particular set of files (a total of 2 TB). I'd like to change the block size of files in the Hadoop dfs (from 64MB to 128 MB). I can't find how to do it in the documentation for only one set of files and not the entire cluster.

Which command changes the block size when I upload? (Such as copying from local to dfs开发者_高级运维.)


For me, I had to slightly change Bkkbrad's answer to get it to work with my setup, in case anyone else finds this question later on. I've got Hadoop 0.20 running on Ubuntu 10.10:

hadoop fs -D dfs.block.size=134217728 -put local_name remote_location

The setting for me is not fs.local.block.size but rather dfs.block.size


I change my answer! You just need to set the fs.local.block.size configuration setting appropriately when you use the command line.

hadoop fs -D fs.local.block.size=134217728 -put local_name remote_location

Original Answer

You can programatically specify the block size when you create a file with the Hadoop API. Unfortunately, you can't do this on the command line with the hadoop fs -put command. To do what you want, you'll have to write your own code to copy the local file to a remote location; it's not hard, just open a FileInputStream for the local file, create the remote OutputStream with FileSystem.create, and then use something like IOUtils.copy from Apache Commons IO to copy between the two streams.


In conf/ folder we can change the value of dfs.block.size in configuration file hdfs-site.xml. In hadoop version 1.0 default size is 64MB and in version 2.0 default size is 128MB.

<property> 
    <name>dfs.block.size<name> 
    <value>134217728<value> 
    <description>Block size<description> 
<property>


you can also modify your block size in your programs like this

Configuration conf = new Configuration() ;

conf.set( "dfs.block.size", 128*1024*1024) ;


We can change the block size using the property named dfs.block.size in the hdfs-site.xml file. Note: We should mention the size in bits. For example : 134217728 bits = 128 MB.

0

精彩评论

暂无评论...
验证码 换一张
取 消