开发者

Is there a way to enforce specific endianness for a C or C++ struct?

开发者 https://www.devze.com 2023-03-21 05:07 出处:网络
I\'ve seen a few questions and answers regarding to the endianness of structs, but they were about detecting the endianness of a system, or converting data between the two different endianness.

I've seen a few questions and answers regarding to the endianness of structs, but they were about detecting the endianness of a system, or converting data between the two different endianness.

What I would like to now, however, if there is a way to enforce specific endianness of a given struct. Are there some good compiler directives or other simple solutions besides rewriting the whole thing out of a lot of macros manipulating on bitfields?

A general solution would be nice, but I would be happy with a specific gcc solution as well.

Edit:

Thank you for all the comments pointing out why it's 开发者_运维知识库not a good idea to enforce endianness, but in my case that's exactly what I need.

A large amount of data is generated by a specific processor (which will never ever change, it's an embedded system with a custom hardware), and it has to be read by a program (which I am working on) running on an unknown processor. Byte-wise evaluation of the data would be horribly troublesome because it consists of hundreds of different types of structs, which are huge, and deep: most of them have many layers of other huge structs inside.

Changing the software for the embedded processor is out of the question. The source is available, this is why I intend to use the structs from that system instead of starting from scratch and evaluating all the data byte-wise.

This is why I need to tell the compiler which endianness it should use, it doesn't matter how efficient or not will it be.

It does not have to be a real change in endianness. Even if it's just an interface, and physically everything is handled in the processors own endianness, it's perfectly acceptable to me.


The way I usually handle this is like so:

#include <arpa/inet.h> // for ntohs() etc.
#include <stdint.h>

class be_uint16_t {
public:
        be_uint16_t() : be_val_(0) {
        }
        // Transparently cast from uint16_t
        be_uint16_t(const uint16_t &val) : be_val_(htons(val)) {
        }
        // Transparently cast to uint16_t
        operator uint16_t() const {
                return ntohs(be_val_);
        }
private:
        uint16_t be_val_;
} __attribute__((packed));

Similarly for be_uint32_t.

Then you can define your struct like this:

struct be_fixed64_t {
    be_uint32_t int_part;
    be_uint32_t frac_part;
} __attribute__((packed));

The point is that the compiler will almost certainly lay out the fields in the order you write them, so all you are really worried about is big-endian integers. The be_uint16_t object is a class that knows how to convert itself transparently between big-endian and machine-endian as required. Like this:

be_uint16_t x = 12;
x = x + 1; // Yes, this actually works
write(fd, &x, sizeof(x)); // writes 13 to file in big-endian form

In fact, if you compile that snippet with any reasonably good C++ compiler, you should find it emits a big-endian "13" as a constant.

With these objects, the in-memory representation is big-endian. So you can create arrays of them, put them in structures, etc. But when you go to operate on them, they magically cast to machine-endian. This is typically a single instruction on x86, so it is very efficient. There are a few contexts where you have to cast by hand:

be_uint16_t x = 37;
printf("x == %u\n", (unsigned)x); // Fails to compile without the cast

...but for most code, you can just use them as if they were built-in types.


A bit late to the party but with current GCC (tested on 6.2.1 where it works and 4.9.2 where it's not implemented) there is finally a way to declare that a struct should be kept in X-endian byte order.

The following test program:

#include <stdio.h>
#include <stdint.h>

struct __attribute__((packed, scalar_storage_order("big-endian"))) mystruct {
    uint16_t a;
    uint32_t b;
    uint64_t c;
};


int main(int argc, char** argv) {
    struct mystruct bar = {.a = 0xaabb, .b = 0xff0000aa, .c = 0xabcdefaabbccddee};

    FILE *f = fopen("out.bin", "wb");
    size_t written = fwrite(&bar, sizeof(struct mystruct), 1, f);
    fclose(f);
}

creates a file "out.bin" which you can inspect with a hex editor (e.g. hexdump -C out.bin). If the scalar_storage_order attribute is suppported it will contain the expected 0xaabbff0000aaabcdefaabbccddee in this order and without holes. Sadly this is of course very compiler specific.


No, I dont think so.

Endianness is the attribute of processor that indicates whether integers are represented from left to right or right to left it is not an attribute of the compiler.

The best you can do is write code which is independent of any byte order.


Try using
#pragma scalar_storage_order big-endian to store in big-endian-format
#pragma scalar_storage_order little-endian to store in little-endian
#pragma scalar_storage_order default to store it in your machines default endianness

Read more here


No, there's no such capability. If it existed that could cause compilers to have to generate excessive/inefficient code so C++ just doesn't support it.

The usual C++ way to deal with serialization (which I assume is what you're trying to solve) this is to let the struct remain in memory in the exact layout desired and do the serialization in such a way that endianness is preserved upon deserialization.


I am not sure if the following can be modified to suit your purposes, but where I work, we have found the following to be quite useful in many cases.

When endianness is important, we use two different data structures. One is done to represent how it expected to arrive. The other is how we want it to be represented in memory. Conversion routines are then developed to switch between the two.

The workflow operates thusly ...

  1. Read the data into the raw structure.
  2. Convert to the "raw structure" to the "in memory version"
  3. Operate only on the "in memory version"
  4. When done operating on it, convert the "in memory version" back to the "raw structure" and write it out.

We find this decoupling useful because (but not limited to) ...

  1. All conversions are located in one place only.
  2. Fewer headaches about memory alignment issues when working with the "in memory version".
  3. It makes porting from one arch to another much easier (fewer endian issues).

Hopefully this decoupling can be useful to your application too.


A possible innovative solution would be to use a C interpreter like Ch and force the endian coding to big.


Boost provides endian buffers for this.

For example:

#include <boost/endian/buffers.hpp>
#include <boost/static_assert.hpp>

using namespace boost::endian;

struct header {
    big_int32_buf_t     file_code;
    big_int32_buf_t     file_length;
    little_int32_buf_t  version;
    little_int32_buf_t  shape_type;
};
BOOST_STATIC_ASSERT(sizeof(h) == 16U);


Maybe not a direct answer, but having a read through this question can hopefully answer some of your concerns.


You could make the structure a class with getters and setters for the data members. The getters and setters are implemented with something like:

int getSomeValue( void ) const {
#if defined( BIG_ENDIAN )
    return _value;
#else
    return convert_to_little_endian( _value );
#endif
}

void setSomeValue( int newValue) {
#if defined( BIG_ENDIAN )
    _value = newValue;
#else
    _value = convert_to_big_endian( newValue );
#endif
}

We do this sometimes when we read a structure in from a file - we read it into a struct and use this on both big-endian and little-endian machines to access the data properly.


There is a data representation for this called XDR. Have a look at it. http://en.wikipedia.org/wiki/External_Data_Representation

Though it might be a little too much for your Embedded System. Try searching for an already implemented library that you can use (check license restrictions!).

XDR is generally used in Network systems, since they need a way to move data in an Endianness independent way. Though nothing says that it cannot be used outside of networks.

0

精彩评论

暂无评论...
验证码 换一张
取 消