I have this code to read 64MB of binary data into memory:
#define SIZE 8192
char* readFromFile(FILE* fp)
{
char* memBlk = new char[SIZE*SIZE];
fread(memBlk, 1, SIZE*SIZE, fp);
return memBlk;
}
int main()
{
FILE* fp = fopen("/some_path/file.bin", "rb+");
char* read_data = readFromFile(fp);
// do something on read data
// EDIT: It is a matrix, so I would be reading row-wise.
delete[] memBlk;
fclose(fp);
}
When I use this code independently, the runtime is less than 1 second. However, when I put the exact same code (just to benchmark), in one of our applications, the runtime is 146 seconds. The application is quite a bulky one with upto 5G memory usage.
Some of it can be explained by the current memory usage, c开发者_Python百科ache misses and other factors but a difference by a factor of 146 sounds unreasonable to me.
Can someone explain this?
Memory mapping may improve performance. Any other suggestions are also welcome.
Thanks.
Machine info:
Linux my_mach 2.6.9-67.ELsmp #1 SMP Wed Nov 7 13:56:44 EST 2007 x86_64 x86_64 x86_64 GNU/Linux
EDIT:
Thanks for your answers, However, i missed out on the fact that actually the place where i inserted was itself being called 25 times, so it is not exactly a factor of 146.
Anyways, the answers were helpful, Thanks for your time.
It looks like the additional memory you need for your code induces thrashing in the application which probably is already running at the limit.
If you want to "do something" with the file you can either:
Process the file blockwise
Using
mmap()
or some similar memory mapping technique on your operating system to map the file into memory if you need more complicated access.mmap
ing uses the buffer cache as backing store paging the contents into the file itself insead of the swap space. Using mmap is usually the fastest an easiest way to access a file. While not being totally portable (it can be made portable in the UNIX alike group of OS'es e.g. all BSD's, Linux, Solaris, and MacOSX)
You did not specify what access pattern "do something" will be so its hard to recommend some specific technique
5G is a huge amount of memory, are you sure you have this much physical memory on board. If not the factor of 146 difference is probably due to swapping out to disk to try free up memory.
You should also probably look at using a 64 bit OS on a 64 bit machine.
The process may not have 64MB of free store readily available in one contiguous block. Can you try splitting the 64MB buffer into a chain of smaller chunks, say 64K or 256K in size, and see if that helps improve performance?
精彩评论