I'm writing an application that takes a series of exposures of a target and computes their average and saves the resultant image. This technique is used extensively in astrophotography to reduce noise in the final image. Basically, one computes the average at pixel and writes out the value in the output file.
The number of exposures can be quite high, from 20 to 30 (sometimes even more), and with today's large CCD sensors the resolution, too, can be quite high. So the amount of data can be very very large.
My question is, when it comes to performance, should I read the images row by row (Method #1) or should I read the entire image array of all arrays (Method #2)? Using the former method, I will have to load every corresponding row. So, if I have 10 images and I'm reading row #1 - I will have to read the first row from each image, compute their average and write out the row.
With the latter method, I read all images in their entirety, compute and write out the entire image.
In theory, the latter method ought to be much faster but much more memory intensive. In practice however, I've found that the difference in performance isn't great and this was puzzeling. At most, Method #2 was only 2 to 3 seconds faster than Method #1. However, Method #2 was using upto 1.3 GB of memory for 24 8megapixel images. Method #1, on the other hand, was at most using 70MB. On average, both methods are taking about 20 seconds to process 24 8megapixel images.
I am writing this in Objective-C with a good amount of C thrown in when calling CFITSIO.
Here's Method #1:
pixelRows = (double**)malloc(self.numberOfImages * sizeof(double*)); //alloc. pixel array.
for(i=0;i<self.numberOfImages;i++)
{
pixelRows[i] = (double*)malloc(width*sizeof(double));
}
apix = (double*)malloc(width*sizeof(double));
for(firstpix[1]=1;firstpix[1]<=size[1];firstpix[1]++)
{
[self gatherRowsFromImages:firstpix[1] withRowWidth:theWidth thePixelMap:pixelRows];
[self averageRows:pixelRows width:width theAveragedRow:apix];
fits_write_pix(outfptr, TDOUBLE, firstpix, width,apix, &status);
//NSLog(@"Row %ld written.",firstpix[1]);
}
fits_close_file(outfptr,&status);
NSLog(@"End");
if(!status)
{
NSLog(@"File written successfully.");
}
for(i=0;i<self.numberOfImages;i++)
{
free(pixelRows[i]);
}
free(pixelRows);
free(apix);
Here's Method #2:
imageArray = (double**)malloc(files.count * sizeof(double*));
for(i=0;i<files.count;i++)
{
imageArray[i] = (double*)malloc(size[0] * size[1] * sizeof(double));
fits_read_pix(fptr[i],TDOUBLE,firstpix,size[0] * size[1],NULL,imageArray[i],NULL,&status);
//NSLog(@"%d",status);
}
int fileIndex;
NSLog(@"%d",files.count);
apix = (double*)malloc(size[0] * size[1] * sizeof(double));
for(i=0;i<(size[0] * size[1]);i++)
{
apix[i] = 0.0;
for(fileIndex=0;fileIndex<files.count;fileIndex++)
{
apix[i] = apix[i] + 开发者_如何学运维imageArray[fileIndex][i];
}
//NSLog(@"%f",apix[i]);
apix[i] = apix[i] / files.count;
}
fits_create_file(&outfptr,[outPath UTF8String],&status);
fits_copy_header(fptr[0],outfptr,&status);
fits_write_pix(outfptr, TDOUBLE, firstpix, size[0] * size[1],apix, &status);
fits_close_file(outfptr,&status);
Any suggestions regarding this? Am I expecting too much of a gain by reading in every image in its entirety?
I would always go for the row-by-row approach, since it is scalable. It may also be faster since the memory footprint is smaller, meaning there is no need to swap out any program to disk just for your memory hungry tool.
Furthermore, to optimize the row-by-row approach, you should also consider reading in images per 8 rows (or some other number). E.g. JPEG is stored in 8x8 blocks, so reading in less than 8 rows would be pointless. Of course this depends on the image format and the library you are using.
There are also other considerations regarding the use of cache memory by the cpu. Memory locations that are used frequently don't have to travel to the "slow" memory but can stay closer to the cpu. There are several levels of cache and they vary in size per cpu type. (the biggest of which is typically 8 or 16 mb at the time of writing)
Another thing to consider is the code that does the actual averaging. Tuning this will also gain a lot, especially for the kind of operation you're doing, look at SSE and related topics. Also using integer calculations will probably beat floating point arithmetic. Using bit shifts for division might also be faster than true division, but it will only allow you to divide by 2^n.
精彩评论