I am building a class hierarchy that uses SSE intrinsics functions and thus some of the members of the class need to be 16-byte aligned. For stack instances I can use __declspec(align(#))
, like so:
typedef __declspec(align(16)) float Vector[4];
class MyClass{
...
private:
Vector v;
};
Now, since __declspec(align(#))
is a compilation directive, the following code may result in an unaligned instance of Vector on the heap:
MyClass *myclass = new MyClass;
This too, I know I can easily solve by overloading the new and delete operators to use _aligned_malloc
and _aligned_free
accordingly. Like so:
//inside MyClass:
public:
void* operator new (size_t size) throw (std::bad_alloc){
void * p = _aligned_malloc(size, 16);
if (p == 0) throw std::bad_alloc()
return p;
}
void operator delete (void *p){
MyClass* pc = static_cast<MyClass*>(p);
_aligned_free(p);
}
...
So far so good.. but here is my problem. Consider the following code:
class NotMyClass{ //Not my code, which I have little or no influence over
...
MyClass myclass;
...
};
int main(){
...
NotMyClass *nmc = new NotMyClass;
...
}
Since the myclass instance of MyClass
is created statically on a dynamic instance of NotMyClass, myclass WILL be 16-byte aligned relatively to the beginning of nmc because of Vector's __declspec(align(16))
directive. But this is worthless, since nmc is dynamically allocated on the heap with NotMyClass's new operator, which doesn't nesessarily ensure (and definitely probably NOT) 16-byte alignment.
So far, I can only think of 2 approaches on how to deal with this problem:
Preventing MyClass users from being able to compile the following code:
MyClass myclass;
meaning, instances of MyClass can only be created dynamically, using the new operator, thus ensuring that all instances of MyClass are truly dynamically allocatted with MyClass's overloaded new. I have consulted on another thread on how to accomplish this and got a few great answers: C++, preventing class instance from being created on the stack (during compiltaion)
Revert from having Vector members in my Class and only have pointers to Vector as members, w开发者_运维问答hich I will allocate and deallocate using
_aligned_malloc
and_aligned_free
in the ctor and dtor respectively. This methos seems crude and prone to error, since I am not the only programmer writing these Classes (MyClass derives from a Base class and many of these classes use SSE).
However, since both solutions have been frowned upon in my team, I come to you for suggestions of a different solution.
If you're set against heap allocation, another idea is to over allocate on the stack and manually align (manual alignment is discussed in this SO post). The idea is to allocate byte data (unsigned char
) with a size guaranteed to contain an aligned region of the necessary size (+15
), then find the aligned position by rounding down from the most-shifted region (x+15 - (x+15) % 16
, or x+15 & ~0x0F
). I posted a working example of this approach with vector operations on codepad (for g++ -O2 -msse2
). Here are the important bits:
class MyClass{
...
unsigned char dPtr[sizeof(float)*4+15]; //over-allocated data
float* vPtr; //float ptr to be aligned
public:
MyClass(void) :
vPtr( reinterpret_cast<float*>(
(reinterpret_cast<uintptr_t>(dPtr)+15) & ~ 0x0F
) )
{}
...
};
...
The constructor ensures that vPtr is aligned (note the order of members in the class declaration is important).
This approach works (heap/stack allocation of containing classes is irrelevant to alignment), is portabl-ish (I think most compilers provide a pointer sized uint uintptr_t
), and will not leak memory. But its not particularly safe (being sure to keep the aligned pointer valid under copy, etc), wastes (nearly) as much memory as it uses, and some may find the reinterpret_casts distasteful.
The risks of aligned operation/unaligned data problems could be mostly eliminated by encapsulating this logic in a Vector object, thereby controlling access to the aligned pointer and ensuring that it gets aligned at construction and stays valid.
You can use "placement new."
void* operator new(size_t, void* p) { return p; }
int main() {
void* p = aligned_alloc(sizeof(NotMyClass));
NotMyClass* nmc = new (p) NotMyClass;
// ...
nmc->~NotMyClass();
aligned_free(p);
}
Of course you need to take care when destroying the object, by calling the destructor and then releasing the space. You can't just call delete. You could use shared_ptr<> with a different function to deal with that automatically; it depends if the overhead of dealing with a shared_ptr (or other wrapper of the pointer) is a problem to you.
The upcoming C++0x standard proposes facilities for dealing with raw memory. They are already incorporated in VC++2010 (within the tr1
namespace).
std::tr1::alignment_of // get the alignment
std::tr1::aligned_storage // get aligned storage of required dimension
Those are types, you can use them like so:
static const floatalign = std::tr1::alignment_of<float>::value; // demo only
typedef std::tr1::aligned_storage<sizeof(float)*4, 16>::type raw_vector;
// first parameter is size, second is desired alignment
Then you can declare your class:
class MyClass
{
public:
private:
raw_vector mVector; // alignment guaranteed
};
Finally, you need some cast to manipulate it (it's raw memory until now):
float* MyClass::AccessVector()
{
return reinterpret_cast<float*>((void*)&mVector));
}
精彩评论