iPhone SDK - Optimize a for loop_问答_开发者_运维开发者技术经验分享

I'm developing an image processing application and I'm looking for an advise to tune my code.

My need is to split the image into blocs (80x80), and for each blocs, calculate the average color.

My first method contains the main loops where the second method is called :

- (NSArray*)getRGBAsFromImage:(UIImage *)image {
int width       = image.size.width;
int height  = image.size.height;

int blocPerRow  = 80;
int blocPerCol  = 80;

int pixelPerRowBloc = width  / blocPerRow;
int pixelPerColBloc = height / blocPerCol;

int xx,yy;

// Row loop
for (int i=0; i<blocPerRow; i++) {

    xx = (i * pixelPerRowBloc) + 1;

    // Colon loop
    for (int j=0; j<blocPerCol; j++) {

        yy = (j * pixelPerColBloc) +1;

        [self getRGBAsFromImageBloc:image 
                    atX:xx 
                    andY:yy 
                    withPixelPerRow:pixelP开发者_JS百科erRowBloc 
                    AndPixelPerCol:pixelPerColBloc];
    }
}
// return my NSArray not done yet !
}

My second method browses the pixel bloc and returns a ColorStruct :

- (ColorStruct*)getRGBAsFromImageBloc:(UIImage*)image 
                            atX:(int)xx 
                            andY:(int)yy 
                            withPixelPerRow:(int)pixelPerRow 
                            AndPixelPerCol:(int)pixelPerCol {

// First get the image into your data buffer
CGImageRef imageRef = [image CGImage];

NSUInteger width = CGImageGetWidth(imageRef);
NSUInteger height = CGImageGetHeight(imageRef);

CGColorSpaceRef colorSpace = CGColorSpaceCreateDeviceRGB();

unsigned char *rawData = malloc(height * width * 4);

NSUInteger bytesPerPixel = 4;
NSUInteger bytesPerRow = bytesPerPixel * width;

    NSUInteger bitsPerComponent = 8;

CGContextRef context = CGBitmapContextCreate(rawData, width, height,
                bitsPerComponent, bytesPerRow, colorSpace,
                kCGImageAlphaPremultipliedLast | kCGBitmapByteOrder32Big);

CGColorSpaceRelease(colorSpace);

CGContextDrawImage(context, CGRectMake(0, 0, width, height), imageRef);
CGContextRelease(context);

// Now your rawData contains the image data in the RGBA8888 pixel format.
int byteIndex = (bytesPerRow * yy) + xx * bytesPerPixel;

    int red = 0;
    int green = 0;
    int blue = 0;
    int alpha = 0;
    int currentAlpha;

    // bloc loop
    for (int i = 0 ; i < (pixelPerRow*pixelPerCol) ; ++i) {
        currentAlpha = rawData[byteIndex + 3];

        red   += (rawData[byteIndex]        )   * currentAlpha;
        green += (rawData[byteIndex + 1]) * currentAlpha;
        blue  += (rawData[byteIndex + 2]) * currentAlpha;
        alpha += currentAlpha;

        byteIndex += 4;

        if ( i == pixelPerRow ) {
            byteIndex += (width-pixelPerRow) * 4;
        }
    }
    red     /= alpha;
    green /= alpha;
    blue    /= alpha;

    ColorStruct *bColorStruct = newColorStruct(red, blue, green);

    free(rawData);

    return bColorStruct;
   }

ColorStruct :

typedef struct {
  int red;
    int blue;
    int green;
} ColorStruct;

with constructor :

ColorStruct *newColorStruct(int red, int blue, int green) {
ColorStruct *ret = malloc(sizeof(ColorStruct));
ret->red = red;
    ret->blue = blue;
ret->green = green;
return ret;
}

As you can see, I have three level of loop : the row loop, the colon loop, and the bloc loop.

I have tested my code and it takes about 5 to 6 seconds for an 320x480 pictures.

Any help is welcomed.

Thanks, Bahaaldine

Seem like a perfect problem to give it the Grand Central Dispatch ?

I think the main problem in this code is there are too many image reads. The entire image is loaded to memory for every(!) block (malloc is expensive too). You should preload image data once (cache it) and then use that memory in getRGBAsFromImageBloc(). Now for 320x480 picture you have 4 x 6 = 24 blocks. So you can speed up you app manyfold by only using caching.

At the end of the day taking an image and performing three multiplies and five additions on each pixel sequentially is always going to be relatively slow.

Luckily, what you're doing can be thought of as a special case of interpolating an image from one size to another - i.e. the average pixel of an image is the same as that image resized to a size of 1x1 (assuming the resizing is using some form of linear interpolation, but that's usually the standard way to do it) and there's a few highly optimized (or at least more optimized than you're likely to get without enormous effort) options for doing that that are part of the iPhone's graphics libraries. At first I'd try using the Quartz methods to resize an image:

    CGImageRef sourceImage = yourImage;

int numBytesPerPixel = 4;
u_char* scaledImageData = (u_char*)malloc(numBytesPerPixel);

CGColorSpaceRef colorspace = CGImageGetColorSpace(sourceImage);
CGContextRef context = CGBitmapContextCreate (scaledImageData, 1, 1, 8, numBytesPerPixel, colorspace, kCGImageAlphaNoneSkipFirst);
CGColorSpaceRelease(colorspace);
CGContextDrawImage(context, CGRectMake(0,0,1,1), sourceImage);

int a = scaledImageData[0];
int r = scaledImageData[1];
int g = scaledImageData[2];
int b = scaledImageData[3];

(this just scales the original image down to 1 pixel and doesn't show the cropping of the sub regions but unfortunately I don't have time for that code right now - if you try to implement it and get stuck add a comment and I can show you how you would do that).

If that doesn't work you could always try using OpenGL ES to do this (create a texture out of the part of your image you need to scale, render it to a 1x1 buffer, and test the result from the buffer). This is a lot more complicated but might have some advantages in that it gives you access to the GPU, which might be a lot faster for large images.

Hope that makes sense and helps...

P.S. - Definitely follow y0prst's suggestion and only read the image in once - that is an easy fix that is going to buy you a ton of performance.

P.P.S - I haven't tested the code so usual caveats apply.

You're inspecting every single pixel - something that, it would seem, is going to take roughly the same amount of time no matter how you loop through it (provided you only inspect each pixel once).

I would suggest using a random sampling within the bloc - every "n'th" pixel, which would reduce the loop time (and the accuracy), or allow for an adjustable granularity.

Now, if there is an existing algorithm for computing the average of a group of pixels - that would be something to consider as an alternative.

You can speed things up by not calling a method in the middle of your loop. Just include the code inline.

ADDED: Also, you might try doing the draw image only once, not repeated in a loop, if you have enough memory.

After you do that, you can try hoisting some of the multiplies out of the inner loop as well for a little additional performance (although the Compiler may optimize some of this for you).