开发者

Minimize disk usage while doing unix sort

开发者 https://www.devze.com 2023-03-28 03:11 出处:网络
I have a lot of files, say 1000 files, each with 4mb. Totally there are 4gb. I would like to sort them by using unix sort, here is my command:

I have a lot of files, say 1000 files, each with 4mb. Totally there are 4gb. I would like to sort them by using unix sort, here is my command:

sort -t ',' -k 1,1 -k 5,7 -k 22,22 -k 2,2r INPUT_UNSORTED_${current_time}.DAT -o INPUT_SORTED_${current_time}.DAT

where INPUT_UNSORTED is a big file created by appending the 1000 files. So there is another 4gb. INPUT_SORTED is another 4gb too.

And I discovered unix sort used a temp folder to sort the files, and the temp files may reach to 4gb too.

How can I reduce disk usage witho开发者_如何学Cut losing performance?


Is your goal to get a single big sorted output file? Take a look at sort's --merge option. You can sort the small input files individually, and then merge them all into the large sorted output. If you delete each unsorted input file immediately after producing its sorted counterpart, you won't use more than 4MB of space on intermediate results.

0

精彩评论

暂无评论...
验证码 换一张
取 消