开发者

How should I iterate through a binary file in c++?

开发者 https://www.devze.com 2023-03-16 04:22 出处:网络
TL;DR What would be a good way, in C++ and using STL idioms, to iterate through a binary file to read, transform, and then again write out the data? The files can be pretty large (several hundred MB

TL;DR

What would be a good way, in C++ and using STL idioms, to iterate through a binary file to read, transform, and then again write out the data? The files can be pretty large (several hundred MB) so I don't want to load the entire file into memory at one time.

More context

I am trying to improve a utility which performs various operations on binary files. These files contain set of records consisting of a header and then the data. The utility provides options to dump the file to text, filter out certain records, extract certain records, append records etc. Unfortunately all of these functions have the code to read and write from the file copied and pasted into every function so the single source file contains a lot of redundant code and is starting to get out of hand.

I'm only just getting up to speed with using C++ and the STL but this is something that seems should be doable with some sort of template/iterator magic but I can't find a good example explaining this scenario. The other strategy I may pursue is to wrap the file access in a class which provides GetNextRecord and WriteNextRecord methods.

Below is a self-contained/(extremely) simplified version of what I'm working on. Is there a good way to write a function to read the data in the file created by WriteMyDataFile and create a new output file that removes all the records containing an 'i' character? I'm looking to abstract away the reading/writing of the file so that the function can mainly be about working with the data.

#include <fstream>
#include <iostream>
#include <sstream>
#include <string>
#include <vector>

using namespace std;

const int c_version = 1;

struct RecordHeader
{
    int length;      
    int version;
};

void WriteMyDataFile(char* recordFile, char* data)
{
    ofstream output (recordFile, ios::out | ios::binary);

    stringstream records(data);

    while(records)
    {
        string r;
        records >> r;

        if(r.length() < 1)
        {
            continue;
        }

        RecordHeader header;
        header.length = r.length();
        header.version = c_version;

        output.write((char*)&header, sizeof(header));
        output.write(r.data(), header.length);
    }

    output.close();
}

 vector<string> ReadDataFile(char* recordFile)
 {
    vector<string> records;
    ifstream input (recordFile, ios::in | ios::binary);

    while(!input.eof())
    {
        RecordHeader header;
        input.read((char*)&header, sizeof(header));

        if(!input.eof())
        {
            char* buffer = new char[header.length + 1];

            input.read(buffer, he开发者_如何学Cader.length);
            buffer[header.length] = '\0';

            string s(buffer);
            records.push_back(s);

            delete[] buffer;
        }
    }
    return records;
}


int main(int argc, char *argv[])
{
    WriteMyDataFile(argv[1], argv[2]);
    vector<string> records = ReadDataFile(argv[1]);

    for(int i=0; i < records.size(); i++)
    {
        cout << records[i] << endl;
    }

    return 0;
}

To run this:

C:\>RecordUtility.exe test.bin "alpha bravo charlie delta"

Output:

alpha

bravo

charlie

delta


I'd handle this by overloading operator>> and operator<< for your Record type:

struct Record { 
    struct header {
        int length;
        int version;
    }

    header h;
    std::vector<char> body;
};

std::istream &operator>>(std::istream &is, Record &r) {
    is.read((char *)&r.h, sizeof(r.h));
    body.resize(h.length);
    is.read(&body[0], h.length);
    return is;
}

std::ostream &operator<<(std::ostream &os, Record const &r) { 
    os.write((char *)r.h, sizeof(r.h));
    os.write(r.body, r.body.size());
    return OS;
}

Once you've done that, you can use istream_iterator and ostream_iterator with a stream of those structures. For example, to do the copy roughly equivalent to what you have above would be something like:

std::ifstream in("some input file");

std::copy(std::istream_iterator<Record>(in), 
          std::istream_iterator<Record>(),
          std::ostream_iterator<Record>(std::cout, "\n"));

Or if, for example, you wanted to copy only those records with a Version number of 2 or higher, you could do something like:

struct filter { // or use a lambda in C++0x
    bool operator()(Record const &r) { return r.h.Version < 2; }
};

std::remove_copy_if(std::istream_iterator<Record>(in),
                    std::istream_iterator<Record>(),
                    std::ostream_iterator<Record>(std::cout, "\n"),
                    filter());


Instead of this:

while(!input.eof())

It is easier (and clearer) to write:

RecordHeader header;
while(input.read((char*)&header, sizeof(header)))
{

To do the template magic you want is to us std::istream_iterator and std::ostream_iterator.

This basically requires uyou to write the operator >> and operator << for your class.

PS. I hate the use of binary objects (RecordHeader). It makes the code harder to maintain. Stream the object so that it knows how to read itself back in. Which leads back to the operators >> and <<


The code you posted and you idea about a wrapper class,looks like the best way to do this with the STL for me.

If you want to provide the plain data to your main program, you may have a look at boost::iostream. It provides some good ways to implement filters "into" a stream (for example a zlib filter) and maybe what you are looking for.


You can create your own stream operators operator<< and operator>> to manage the reading/writing of your Record structures from a stream. Then you can run things through your vector of record, applying whatever filtering you desire (perhaps with std::remove_if, for the example in the question) and write it back similar to below...

#include <algorithm>
#include <vector>
#include <iostream>
#include <iterator>
#include <stdexcept>
#include <sstream>

namespace {
    template <class Type>
    void WriteBinary(const Type& data, std::ostream& os)
    {
        const char *binaryData = reinterpret_cast<const char*>(&data);
        os.write(binaryData, sizeof(data));
    }

    template <class Type>
    Type ReadBinary(std::istream& is)
    {
        Type data;
        is.read(reinterpret_cast<char*>(&data), sizeof(data));
        return data;
    }
}

struct Record
{
    int               mVersion;
    std::vector<char> mData;
};

std::ostream& operator<<(std::ostream& os, const Record& record)
{
    WriteBinary(record.mData.size(), os);
    WriteBinary(record.mVersion, os);

    std::copy(record.mData.begin(), 
              record.mData.end(), 
              std::ostream_iterator<char>(os)); 

    return os;
}

std::istream& operator>>(std::istream& is, Record& record)
{
    if (std::char_traits<char>::not_eof(is.peek()))
    {
        typedef std::vector<char>::size_type size_type;

        size_type length = ReadBinary<size_type>(is);
        record.mVersion = ReadBinary<int>(is);

        if (record.mVersion != 1)
        {
            throw std::runtime_error("Invalid version number.");
        }

        record.mData.clear();
        record.mData.resize(length);
        is.read(&record.mData.front(), length);
    }
    else
    {
        // Read the EOF char to invalidate the stream.
        is.ignore();
    }

    return is;
}

int main()
{
    // Create a Record
    std::string str = "Hello";

    Record rec;
    rec.mVersion = 1;
    rec.mData.assign(str.begin(), str.end());

    // Write two copies of the record to the stream.
    std::stringstream ss;
    ss << rec << rec;

    // Read all the records in the "file"
    std::vector<Record> records((std::istream_iterator<Record>(ss)),
                                std::istream_iterator<Record>());

    std::cout << "Read " << records.size() << " records." << std::endl;

    // Manipulate records here...then write all of them back to a file.
    std::stringstream myNewFile;
    std::copy(records.begin(), 
              records.end(), 
              std::ostream_iterator<Record>(myNewFile));

    return 0;
}
0

精彩评论

暂无评论...
验证码 换一张
取 消