Optimization of a c++ matrix/bitmap class_问答_开发者

Optimization of a c++ matrix/bitmap class

开发者 https://www.devze.com 2023-01-01 01:08 出处：网络

I am searching a 2D matrix (or bitmap) class which is flexible but also fast element access. The contents A flexible class should allow you to choose dimensions during runtime, and would look somethin

class Matrix 
{
public:   
  Matrix(int w, int h) :  
    data(new int[x*y]), width(w)  {}

 void SetElement(int x, int y, int val)  
 {  
   data[x+y*width] = val;  
 } 

 // ...
private: // symbols
  int width;  
  int* data;  
};

A faster often proposed solution using templates is (simplified):

template <int W, int H>
class TMatrix {
  TMatrix() data(new int[W*H])  {}

  void SetElement(int x, int y, int val)  
  {  
    data[x+y*W] = val;  
  } 

  private:
    int* data;
};

This is faster as the width can be "inlined" in the code. The first solution does not do this. However this is not very flexible anymore, as you can't change the size anymore at runtime.

So my question is: Is there a possibility to tell the compiler to generate faster code (like when using the template solution), when the size in the code is fixed and generate flexible code when its runtime dependend?

I tried to achieve this by writing "const" where ever possible. I tried it with gcc and VS2005, but no success. This kind of optimization would be useful for many o开发者_StackOverflow中文版ther similar cases.

I'd just go with the first version, myself.

But, if you really want to try to get the best of both worlds, you could have a Matrix class which holds a pointer to a polymorphic implementation type. For common sizes (say up to 4x4), you could point at template instantiations, and for larger you could point at an implementation that handled the general MxN case.

Having said all that, I think all the indirection & virtual calls would negate any performance improvement that might come from the templates. I don't think you can have your cake & eat it too, in this case.

If you're always dealing with data who's size is known at compile time (graphics/geometry vectors for example), you're better off with the template version (possibly storing the data in statically sized (non-heap allocated) arrays). If you need a general capability for arbitrary data, use the dynamic version instead.

Of course your needs may differ, but I'd skip the automatic generation and just go with a plain&simple set of "fixed" versions. E.g. Vector3, Vector4, Matrix3x3, Matrix3x4, and Matrix4x4. I suppose you could derive all of those from the templated version, but it won't make any particular performance difference.

Is there any particular reason why you want to be able to change the dimensions at runtime? Because I would suggest that just copying from one to the other wouldn't be terribly costly for the (what I suspect to be rare) instances when the change needs to occur.

Finally- something that I've seen done is to have named element access as well as the array, but you can only do that with "hard coded" types. Something like:

class Vector3
{
public:
   // other stuff...

   union
   {
      struct { float x, y, z; };
      float m[3];
   };
};

(that may not be entirely legal C++, hack to suit your compiler.)

Oh, even the templated version doesn't need to use new. Just declare the data as float data[W*H]; Getting it out of the heap will be a bigger performance boost than "optimizing out" a bit of math.

Not so much a complete answer, but some info that may help (if you're not already aware of these): Both OpenCV and Boost (uBLAS) have very good (fast/complete/full-featured) matrix implementations. I've not looked inside them to see how they set/get elements or resize after instantiation though.