开发者

Large Number of file concatenation

开发者 https://www.devze.com 2023-03-25 15:49 出处:网络
I have around 3-4 million files in开发者_Go百科 a directory filename ending with, say type1.txt, type2.txt. (file are 1type1.txt, 1type2.txt,2type2.txt,2type2.txt etc )

I have around 3-4 million files in开发者_Go百科 a directory filename ending with, say type1.txt, type2.txt. (file are 1type1.txt, 1type2.txt,2type2.txt,2type2.txt etc )

Now I want to concatenate all files ending with type1.txt & type2.txt.

Currently I am doing cat *type1.txt > allTtype1.txt similarly for type2.txt. I wanted to preserve order in both final output file, it is my guess that cat does that. But it is too slow.

Please suggest some faster method to do the same.

Thanks, Ravi


You can do this using this command:

ls | while read file; do cat $file >> allTtype${file#*type}; done

But as snap said above in his answer, each time cat need to open a file, it will have to do an inode lookup which would take a long time in a directory with lots of file. To try to speed things up, you could cat by inode using icat from the Sleuth Kit:

ls -i | while read -a file_array; do icat /dev/sda1 ${file_array[0]} >> allTtype${file_array[1]#*type}; done

And even better, you can put the resulting files in another directory:

ls -i | while read -a file_array; do icat /dev/sda1 ${file_array[0]} >> /newdir/allTtype${file_array[1]#*type}; done


cat itself is not slow. But every time you expand a shell wild card (? and *), the shell will read and search through all the file names in that directory, which is very slow.

Also the kernel will take time finding the file when you open it by name, which you can not avoid. This depends on the file system in use (unspecified in the question): some file systems are more intelligent with huge directories than others.

To sort this out you might benefit from taking a file listing once:

ls > /tmp/filelist

...and then using grep or similar for selecting the files out of that list:

cat `grep foo /tmp/filelist` > /out/bar

After you have sorted this mess out, make sure to structure your storage/application in such a way that this does not ever happen again. :) Also make sure to to rmdir the existing directory after you have gotten your files out of it (using it again for any purpose will not be effective even if there is just a single file in it).

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号