Should we avoid repetitive code in C++ in order to be "Pythonic", and how?_问答_开发者

I am in larval stage with Python and pre-egg stage in C++, but i am trying to do my best, specially with the "Don't Repeat Yourself" principle.

I have a multichannel raw file-format to open, with a main ascii header with fields representable as strings and integers (always coded as chars padded with white spaces). The second part is N headers, with N being a field of the main header, and each of those headers has itself a lot more of text and number fields (coded as ascii) refering to the length and size of the actual 16 bit multichannel streams that compose the rest of the file.

So far, I have this working code in C++:

#include <iostream>
#include <sstream>
#include <fstream>
#include <string>
#include <map>

using namespace std;

struct Header {
    string version;
    string patinfo;
    string recinfo;
    string start_date;
    string start_time;
    int header_bytes;
    string reserved;
    int nrecs;
    double rec_duration;
    int nchannels;
};

struct Channel {
    string label;
    string transducertype;
    string phys_dim;
    int pmin;
    int pmax;
    int dmin;
    int dmax;
    string prefiltering;
    int n_samples;
    string reserved;
};


int main()
{
    ifstream edf("/home/helton/Dropbox/01MIOTEC/06APNÉIA/Samples/Osas2002plusQRS.rec", ios::binary);

    // prepare to read file header
    Header header;
    char buffer[80];

    // reads header fields into the struct 'header'
    edf.read(buffer, 8);
    header.version = string(buffer, 8);

    edf.read(buffer, 80);
    header.patinfo = string(buffer, 80);

    edf.read(bu开发者_C百科ffer, 80);
    header.recinfo = string(buffer, 80);

    edf.read(buffer, 8);
    header.start_date = string(buffer, 8);

    edf.read(buffer, 8);
    header.start_time = string(buffer, 8);

    edf.read(buffer, 8);
    stringstream(buffer) >> header.header_bytes;

    edf.read(buffer, 44);
    header.reserved = string(buffer, 44);

    edf.read(buffer, 8);
    stringstream(buffer) >> header.nrecs;

    edf.read(buffer,8);
    stringstream(buffer) >> header.rec_duration;

    edf.read(buffer,4);
    stringstream(buffer) >> header.nchannels;

    /*
    cout << "'" << header.version << "'" << endl;
    cout << "'" << header.patinfo << "'" << endl;
    cout << "'" << header.recinfo << "'" << endl;
    cout << "'" << header.start_date << "'" << endl;
    cout << "'" << header.start_time << "'" << endl;
    cout << "'" << header.header_bytes << "'" << endl;
    cout << "'" << header.reserved << "'" << endl;
    cout << "'" << header.nrecs << "'" << endl;
    cout << "'" << header.rec_duration << "'" << endl;
    cout << "'" << header.nchannels << "'" << endl;
    */

    // prepare to read channel headers
    int ns = header.nchannels; // ns tells how much channels I have
    char title[16]; // 16 is the specified length of the "label" field of each channel

    for (int n = 0; n < ns; n++)
    {
        edf >> title;
        cout << title << endl; // and this successfully echoes the label of each channel
    }


    return 0;
};

Some remarks I already have to make:

I opted to use struct because the format specification is very hardcoded;
I didn't iterate over the main header fields because the number of bytes and types to read seemed to me rather arbitrary;
Now that I successfully got each channel's label, I would actually create structs for each channel's fields, which by themselves would have to be stored perhaps in a map.

My (hopefully straightforward) question is:

"Should I worry about cutting corners to make this kind of code more 'Pythonic' (more abstract, less repetitive), or this is not the way things work in C++?"

Many Python evangelists (as I would be myself, because I love it) highlight its easyness to use and all that. So, I will wonder for some time if I am doing dumb things or only doing things right, but not so "automagical" because of the very nature of C++.

Thanks for reading

Helton

I'd say there's no such thing as Pythonic C++ code. The DRY principle applies in both languages, but much of what is considered "Pythonic" is simply the shortest, sweetest way of expressing logic in Python, using Python-specific constructs. Idiomatic C++ is quite different.

lambda, for example, is sometimes not considered very Pythonic and reserved for cases where no other solution exists, but is just being added to the C++ standard. C++ has no keyword arguments, which are very Pythonic. C++ programmers don't like constructing a map when not necessary, while a Python programmer might throw dict at a lot of problems where they just happen to make the intention clearer than the efficient alternative.

If you want to save typing, use the function I posted earlier, then:

header.version = read_field(edf, 8);
header.patinfo = read_field(edf, 80);

That should save you quite a few lines. But more important than those few lines is that you've achieved a small amount of modularity: how to read a field and what fields to read are now separate parts of your program.

You are correct: as written, the code is repetitive (and has no error checking). Each field that you read really requires you to take three or five steps, depending on the type of data being read:

Read the field from the stream
Ensure the read succeeded
Parse the data (if necessary)
Ensure the parse succeeded (if necessary)
Copy the data into the target location

You can wrap all three of these up into a function so that the code is less repetitive. For example, consider the following function templates:

template <typename TStream, typename TResult>
void ReadFixedWidthFieldFromStream(TStream& str, TResult& result, unsigned sz) 
{
    std::vector<char> data(sz);

    if (!str.read(&data[0], sz))
        throw std::runtime_error("Failed to read from stream");

    std::stringstream ss(&data[0]);
    if (!(ss >> result))
        throw std::runtime_error("Failed to parse data from stream");
}

// Overload for std::string:
template <typename TStream>
void ReadFixedWidthFieldFromStream(TStream& str, std::string& result, unsigned sz) 
{
    std::vector<char> data(sz);

    if (!str.read(&data[0], sz))
        throw std::runtime_error("Failed to read from stream");

    result = std::string(&data[0], sz);
}

Now your code can be much more succinct:

ReadFixedWidthFieldFromStream(edf, header.version, 8);
ReadFixedWidthFieldFromStream(edf, header.patinfo, 80);
ReadFixedWidthFieldFromStream(edf, header.recinfo, 80);
// etc.

This code is straightforward, simple, and easy to understand. If it's working, don't waste time changing it. I'm sure there's plenty of badly written, complex, and difficult to understand (and probably incorrect) code that should be fixed first :)

The Zen of Python doesn't mention DRY explicitly.

>>> import this
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

For reading from file directly in strings see this question The rest is wrong. ~~but personally I think there's a better/cleaner way of doing this.~~

If you know the size of the structure don't use string, use primitive C types (and make sure the structure is packed). See these links: http://msdn.microsoft.com/en-us/library/2e70t5y1(v=vs.80).aspx & http://gcc.gnu.org/onlinedocs/gcc-3.2.3/gcc/Type-Attributes.html

I would do it this way for example (not sure about the size of each string but you get the idea):

struct Header { char version[8]; char patinfo[80]; char recinfo[80]; char start_date[8]; char start_time[8]; int header_bytes; char reserved[44]; int nrecs; double rec_duration; int nchannels; };

Once you have a packed structure you can read it directly from the file:

struct Header h; edf.read(&h,sizeof(struct Header));

For me this is the cleanest way to do it, but remember you must have your structure packed so that you have the guarantee that the structure in memory has the same size as the structure saved in the file - this is not very hard to see while testing.