Potentially long loop and declaring variables inside_问答_开发者

I've recently written a dynamic program that calculates the similarity (modified edit distance) between two sequences of DNA strands (can be lengthy).

My code is like (not actual code since its an assignment):

while(!file.eof){
   string line;
   int sizeY, sizeX;

   //get first strand
   getline(db, line)

   //second strand
   getline(db, line)

   double ** ary = new double[sizeY];
   //loop to initialize array

   for(i to sizeY)
   {
      for(i to sizex)
      {
            pair<string,string> p,d;
            p.first = "A";
            p.second = "T";
            d.first = "G";
            d.second = "C";
            //do some comparisons
      }
   }
}

开发者_开发问答

The code above will take approximately 40 minutes to complete on a file with ~2400 lines. If I move the pair p,d and assignments outside the nested for-loop and run the exact same file, it will complete in about ~1 minute.

I've read in other threads that the performance is pretty much the same. I've also compiled it with -O2.

Why is the code above so much slower?

Consider the various allocations/deallocations that have to happen on each and every iteration of the inner loop.

Allocate a pair object on the stack
Allocate four empty strings, each probably a 1 byte allocation on the heap
Four string assignments which may require 4 heap deallocations and new allocations
Destruction of the strings involving 4 heap deallocations
Destruction of the pair object

Ignoring the stack allocations (which should be relatively cheap) that is a total of 8 heap allocations and another 8 deallocations (or a best case of 4/4). If this is a debug build there may be additional overhead in checking each heap operation.

If your sizeX/sizeY constants are 2400 then you're doing a total of of 92 million heap operations. If you're lucky each of those operations will take around the same time since you're allocating the same sized object each loop. If you're unlucky then some heap operations may take significantly longer to accomplish due to heap fragmentation.

The obvious solution, as you've found, is to put the variable definition and assignment outside the loop. You only need to reassign the pair values if they are being overwritten within the loop at some point.

Generic answer: It would appear you're using gcc (that is to say g++); you can always do g++ -S [stuff] to see what G++ is doing to your code (assuming you can read assembly well enough to get by).

Specific answer: I'm surprised the difference is 40 times, but in your code, every time you finish a loop, it has to call create_new_pair twice (and I would have thought it would have had to do a little bit of cleanup to "free" the old pair, but given that its on the stack, I guess that's not as hard as I would have thought, or at least I'm not seeing it... reading code from Gcc used to be a lot easier than reading C++ code is at the moment)

It is probably becouse the variable is an object. Since p and d are not a primitive type, unless the compiler inline it's constructor and destructor (which may happen if you use -O3 instead of -O2), it will construct and destruct the two std::pair (and as consequence four std::string) each iteration. If it was a primitive variable (like int), the compiler could optimize this even if you don't have inline optimization enabled.

EDIT: Note that since std::string internally use heap allocations, not even inline would optimize those allocations (but you still save some overhead with the inline). For a std::pair of int, with -O3, the performance should be the same inside or outside the loop.

In a compiled language with a good compiler (that does at least mediocre optimization) having variables declared inside a loop is never going to be a "loser" and often (especially for compilers with only modest optimization) will be a "winner".

With interpreted languages it's likely to be different, though. Each time through the loop the interpreter will need to allocate the variable locations, and that could be expensive.

You could also have a "loser" situation with a poorly-designed compiled environment where allocating variables on the stack is expensive.

Though with any of these scenarios I'd be at a loss to explain a 40:1 difference. I suspect that the omitted code may contain some important clues.

[Ah, on re-reading (and possibly on the poster's re-editing) I see that it's not simply variable DECLARATION, but OBJECT creation that's being moved outside of the loop.]