开发者

How do I sort a file that has a very long list of items?

开发者 https://www.devze.com 2023-01-03 11:31 出处:网络
I have a text file that has a very long list of items. So I want to sort them alphabetically but I do not want to load all the file into the memory (RAM).

I have a text file that has a very long list of items. So I want to sort them alphabetically but I do not want to load all the file into the memory (RAM).

I tried loading all the contents of the开发者_C百科 file to an array and sort them just like I do normally. But the system complains that there are no much memory!!

Thanks, Mohammad


You'll need to read up on external sorting. The basic approach is to use some sort of divide-and-conquer routine like merge sort, where you read and sort a portion of the file, then read and sort another portion of the file, etc. and when you get to the end you merge the sorted portions together.


Maybe the STXXL (Standard Template Library for Extra Large Data Sets) helps.

STXXL offers external sorting amongst others.


You don't have to hold the whole file in memory. If this is a task you don't have to do very often, you can write an application that sorts it very slow. Something like this (pseudo):

vector<int> linesProcessed;
for (int i = 0; i < lineCount; i++)
{
   if (linesProcessed contains i) continue;
   string alphabeticalFirstLine;
   int lineIndex;
   foreach line in oldFile
   {
       if (line is before alphabeticalFirstLine)
       {
            alphabeticalFirstLine = line;
            lineIndex = i;
       }
   }
   write alphabeticalFirstLine to newFile;
   vector.add(lineIndex);
}
clear vector;
delete oldFile;
rename newFile to oldFile;


If you are using some unix-like OS you can use sort command. It will take care about memory consumption. For an example something like "cat large_file | sort" will do the job.

Or you can write your own / use external sorting from the library. Tell us what language are you using and maybe someone will tell you exact library to use.

0

精彩评论

暂无评论...
验证码 换一张
取 消