How to speed up algorithm (Binarization, integral image)_问答_开发者

I wrote this procedure based on integral image algorithm described at this url

http://people.scs.carleton.ca/~roth/iit-publications-iti/docs/gerh-50002.pdf

Is there any way to do this code faster?

Pointers are much faster as dynamic arrays?

procedure TForm1.bBinarizationClick(Sender: TObject);
var
  iX1, iY1,
  iX2, iY2,
  ii, jj,
  s, s2,
  iSum, iCount,  index,
  iHeight, iWidth : Integer;
  iSize: Integer;

  row : ^TRGBTriple;
  black : TRGB开发者_JAVA百科Triple;
  aIntegralIm: array  of Integer;
  aGrays : array of Byte;

  startTime : Cardinal;

begin
  iWidth := bBitmap.Width;
  iHeight := bBitmap.Height;
  iSize := iWidth * iHeight;

  SetLength(aGrays, iSize);
  SetLength(aIntegralIm, iSize);

  black.rgbtRed  := (clBlack and $0000FF);
  black.rgbtGreen := (clBlack and $00FF00) shr 8;
  black.rgbtBlue := (clBlack and $FF0000) shr 16;

  bBitmap2.Canvas.Brush.Color := clWhite;
  bBitmap2.Canvas.FillRect(Rect(0, 0, bBitmap2.Width, bBitmap2.Height));

  s := Round(iWidth / TrackBar2.Position);
    s2 := Round(s / 2);

  startTime := GetTickCount();

  index := 0;

  for ii := 0 to iHeight - 1 do begin
     row := bBitmap.ScanLine[ii];
     for jj := 0 to iWidth - 1 do begin
       aGrays[index] := ((row.rgbtRed * 77 + row.rgbtGreen * 150 + row.rgbtBlue * 29) shr 8);
       inc(index);
       inc(row);
     end;
  end;


  for ii := 0 to iWidth - 1 do begin
     iSum := 0;
     for jj := 0 to iHeight - 1 do begin
       index := jj*iWidth+ii;
       iSum := iSum + aGrays[index];
       if ii = 0 then aIntegralIm[index] := iSum
       else aIntegralIm[index] := aIntegralIm[index - 1] + iSum;
     end;
  end;


  for jj := 0 to iHeight - 1 do begin
     row := bBitmap2.ScanLine[jj];
     for ii := 0 to iWidth - 1 do begin

       index := jj*iWidth+ii;

       iX1 := ii-s2;
       iX2 := ii+s2;
       iY1 := jj-s2;
       iY2 := jj+s2;

       if (iX1 < 0) then iX1 := 0;
         if (iX2 >= iWidth) then  iX2 := iWidth-1;
           if (iY1 < 0) then  iY1 := 0;
             if (iY2 >= iHeight) then  iY2 := iHeight-1;

       iCount := (iX2 - iX1) * (iY2 - iY1);

       iSum := aIntegralIm[iY2*iWidth+iX2]
              - aIntegralIm[iY1*iWidth+iX2]
              - aIntegralIm[iY2*iWidth+iX1]
              + aIntegralIm[iY1*iWidth+iX1];

       if (aGrays[index] * iCount) < (iSum * (100 - TrackBar1.Position) / 100) then  row^ :=  black;

       inc(row);

     end;
  end;

  ePath.Text :=  'Time: ' + inttostr(GetTickCount() - startTime) + ' ms';

  imgOryginal.Picture.Bitmap.Assign(bBitmap2);

end;

You can at least do a few simple things:

precalculate (100 - TrackBar1.Position) into a variable
Instead of division: / 100 use * 100 on the other side. You might not need any floating point values.
Use lookup tables for the following (care to explain the identation btw?):

Code:

if (iX1 < 0) then iX1 := 0;
if (iX2 >= iWidth) then  iX2 := iWidth-1;
if (iY1 < 0) then  iY1 := 0;
if (iY2 >= iHeight) then  iY2 := iHeight-1;

Try to keep the index and icremnet, decrement istead of multiplication: index := jj*iWidth+ii;

My guess is that the second loop is the slow bit.

The trick would be to avoid to recalculate everything in the second loop all the time

If S is constant (relative to the loop I mean, not absolute)

iy1,iy2 only change with the main(jj) loop and so do iy1*width (and iy2*width). Precalculate them, or optimize them away in the same way you do with row. (precalculate once per line, increment inbetween)
change the ii loop into three loops:
- the first bit where ix1=0
- the second where ix1=ii-s ix2=ii+s;
- the third where ix1=ii-s and ix2=iwidth-1

this removes a lot of checks out of the loops, to be done only once.

make a dedicated loop for the condition if (aGrays[index] * iCount) < (iSum * (100 - TrackBar1.Position) / 100) then row^ := black; so that it isn't evaluated for each pixel, since you can precalculate the area's where this happens ?
introduce pointers into the gray calculating loop so that you don't have to recalculate the index each pixel (but e.g. only for the row loop, incrementing a ptr per pixel)

If you are hardy, you can also precalculate the jump between lines. Keep in mind that abs(scanline[j]-scanline[i])-width is a metric for the number of alignment bytes per row.

Even more advanced is optimizing for cache effects on the level of your algorithm. See rotating bitmaps. In code to get an idea how this works. Some pointer tricks are demonstrated there too (but only for 8-bit elements)

I would first use a profiler to find out the CPU usage repartition, to figure out the smallest part(s) of code that would benefit the most from optimisation. Then I would adapt the effort according to the results. If some code represents 90% of the CPU load and is executed zillions of times, even extreme measures (recoding a few sequences using inline assembly language) might make sense.

Use the excellent and free SamplingProfiler to find out the bottleneck in your code. Then optimize and run the profiler again to find the next bottleneck. This approach is much better than guessing what's need to be optimized because even experts are often wrong about that.