I was wondering what to set my buffer[] size to, for reading files in ubuntu.
Does anyone开发者_运维技巧 know the maximum size that can be read using file descriptors read() from a file.
I tried 1GB and got segmentation error, tried 4MB its fine.
Not sure what char buffer[BUFSIZ] is different size on different platforms?
Any suggestions on what would be the best size to set buffer?
Thanks
Using a fixed size for your buffer is probably not a good idea. You never know how large a file can be, really...
Reading large files into memory might not be desirable either, but if you must, then you should maybe look into first using the stat()
and fstat()
functions to find out how large the file really is, then allocate the buffer dynamically using malloc()
/calloc()
or use mmap()
.
You can probably also Google to find information on how to use these functions. There should also be information about other ways to get the file size of a file.
But if you can avoid it, don't read huge files into memory. Rather, read bits of it at a time and process those bits as you would.
The segmentation fault you received has nothing to do with maximum file sizes. Rather, you are allocating a buffer on the stack which exceeds your program's stack space.
When you declare an array like:
char buffer[BUFSIZ];
...it allocates BUFSIZ
bytes on the stack. The amount of stack space you have varies depending on your platform and compiler, but generally it's not anything like 1 GB in size. On some Linux distros with gcc, the default stack-size is 8MB.
If you need to allocate a large buffer to read the file, you'll need to allocate it on the heap using one of the malloc
family of functions.
char* buffer = malloc(BUFSIZ);
Remember, you'll also need to free the buffer when you are done using it.
free(buffer);
Don't read it at once. The buffer size actually depends on how much you are able to allocate. i.e. a few MB using the stack and virtually unlimited using malloc (thanks to virtual memory): in this latter case, if your file is several GB, you then need that much memory.
Just read it block by block using read/fread and you'll be safe. Nobody wants to fill its memory just to read a file. 4kB is a fine buffer size, because it's usually the size of a memory page. And you can readily allocate it on the stack without segfault.
Reading the maximum size is not necessarily the most efficient. Typically, the OS performs buffering underneath, so the requested size is not always terribly important. However, reading the sector size (often 4K) is a good size for sequential reads.
Do not forget that each filesystem type has it's own file size limitations:
- IBM General Parallel File System = 2^99
- XFS = 8 EiB = 8 * 2^60
- OCFS = 4 PiB = 4 * 2^50
- ext4 = 16 TiB = 16 * 2^40
- ext2/ext3 = 2 TiB = 2 * 2^40
精彩评论