开发者

Read variable-length records from a buffer - weird memory issues

开发者 https://www.devze.com 2022-12-22 04:54 出处:网络
I\'m trying to implement an i/o intensive quicksort (C++ qsort) on a very large dataset. In the interests of speed, I\'d like to read in a chunk of data at a time into a buffer and then use qsort to s

I'm trying to implement an i/o intensive quicksort (C++ qsort) on a very large dataset. In the interests of speed, I'd like to read in a chunk of data at a time into a buffer and then use qsort to sort it inside the buffer. (I am currently working with text files but would like to move to binary soon.) However, my data is composed of variable-length records, and qsort needs to be told the length of the record in order to sort. Is there any way to standardize this? The only thing I could think of was rather convoluted: my program currently reads from the buffer until it hits a linefeed character ('10' in ascii), transferring each character over to another array. When it finds a linefeed (the delimiter in the input file), it fills the number of spaces remaining in the buffer for that record (record size is set to 30) with null characters. This way, I should end up with a buffer full of fixed-size records to give qsort.

I know there are several problems with my approach, one being that it's just clumsy, another that the record size might conceivably be larger than 30, but is generally much less. Is there a better way of doing this?

As well, my current code doesn't even work. When I debug it, it seems to be transferring characters from one buffer to the other, but when I try to print out the buffer, it contains only the first record.

Here is my code:

FILE *fp;
u开发者_StackOverflow中文版nsigned char *buff;
unsigned char *realbuff;
FILE *inputFiles[NUM_INPUT_FILES];
buff = (unsigned char *) malloc(2048);
realbuff = (unsigned char *) malloc(NUM_RECORDS * RECORD_SIZE);

fp = fopen("postings0.txt", "r");
if(fp)
{
    fread(buff, 1, 2048, fp);


    /*for(int i=0; i <30; i++)
     cout << buff[i] <<endl;*/

    int y=0;
    int recordcounter = 0;

    //cout << buff;
    for(int i=0;i <100; i++)
    {
        if(buff[i] != char(10))
        {
            realbuff[y] = buff[i];
            y++;
            recordcounter++;
        }        
        else
        {
            if(recordcounter < RECORD_SIZE)
                for(int j=recordcounter; j < RECORD_SIZE;j++)
                {
                    realbuff[y] = char(0);
                    y++;
                }
            recordcounter = 0;
        }
    } 

    cout << realbuff <<endl;   
    cout << buff;
}
else 
    cout << "sorry";

Thank you very much, bsg


The qsort function can only work on fixed length record (like you say). In order to sort variable length records, you need an array of pointers to them and then have qsort sort the array of pointers. This may be more efficient too, as pointers are much faster to move around than large chunks of data are.

The same goes for std::sort, which would be recommended because it is type safe. Just be sure to supply a comparison predicate (a less than function) taking pointers as its arguments as the third parameter.


How about using c++ file streams for parsing your file ?

Checkout this example(website name is strange, no offense!!) which returns the record as a STL vector and then you can use STL Sort algorithm.

0

精彩评论

暂无评论...
验证码 换一张
取 消