I'm reading from certain offsets in several hundred and possibly thousands files. Because I need only certain data from certain offsets at that particular time, I must either keep the file handle open for later use OR I can write the parts开发者_高级运维 I need into seperate files.
I figured keeping all these file handles open rather than doing a significant amount of writing to the disk of new temporary files is the lesser of two evils. I was just worried about the efficiency of having so many file handles open.
Typically, I'll open a file, seek to an offset, read some data, then 5 seconds later do the same thing but at another offset, and do all this on thousand of files within a 2 minute timeframe.
Is that going to be a problem?
A followup: Really, I"m asking which is better to leave these thousands file handles open, or to constantly close them and re-open them just when I instantaneously need them.
Some systems may limit the number of file descriptors that a single process can have open simultaneously. 1024 is a common default, so if you need "thousands" open at once, you might want to err on the side of portability and design your application to use a smaller pool of open file descriptors.
I recommend that you take a look at Storage.py
in BitTorrent
. It includes an implementation of a pool of file handles.
精彩评论