I need to sort a 10gb file containing a list of numbers as fast as possible using only 100mb of memory. I'm breaking them into chunks and then merging them.
I am currently using C File pointers as they go faster than c++ file i/o(atleast on my system).
I tried for a 1gb file and my code works fine, but it throws a segmentation fault as soon as I fscanf after opening the 10gb file.
FILE *fin;
FILE *fout;
fin = fopen( filename, "r" );
while( 1 ) {
// throws the error here
for( i = 0; i < MAX &&开发者_Python百科amp; ( fscanf( fin, "%d", &temp ) != EOF ); i++ ) {
v[i] = temp;
}
What should I use instead?
And do you have any suggestions about how to go about this in the best way possible?
There is a special class of algorithms for this called external sorting. There is a variant of merge sort that is an external sorting algorithm (just google for merge sort tape).
But if you're on Unix, it's probably easier to run the sort command in a separate process.
BTW. Opening files that are bigger than 2 GB requires large file support. Depending on your operating system and your libraries, you need to define a macro or call other file handling functions.
精彩评论