开发者

What's the fastest way to read the first N characters of a line in a large file?

开发者 https://www.devze.com 2023-02-22 06:18 出处:网络
E.g. you hav开发者_如何学编程e a file with 3 lines that are millions of characters long. What\'s the fastest/most efficient way to read the first N characters of line 2?Either you have some insight in

E.g. you hav开发者_如何学编程e a file with 3 lines that are millions of characters long. What's the fastest/most efficient way to read the first N characters of line 2?


Either you have some insight into where the line breaks might be...

  • lines are equally long so divide the file size by three and seek to that address
  • only one line's long but you're not sure which, so read a chunk from the start and end of the file and work out which line it is

...or you're completely in the dark and have to just scan your way along until you find the first newline:

  • use memory mapped I/O for fastest-possible scanning speed, AND
  • tell the operating system (if it lets you) that you're doing a sequential read so it doesn't flood its cache with the file data you're scanning past and won't need again (e.g. on Linux, use madvise() - http://linux.die.net/man/2/madvise )


There is no good way to do it. The line ending is just another character combination that you will have to scan the file for. Some performance tuning can be done on how the file is read e.g. using memory mapping.

If the file isn't changing between reads, creating a separate index file containing the positions of the line breaks will save the scanning time the next time. If you have control over the file writing you can of course create an index (in a separate file, or in a header) when you write the file.

Joel has written a blog post on the subject.


Use ifstream::ignore().

#include <iostream>
#include <fstream>
#include <limits>

using namespace std;

int main(int, char* []) {
    ifstream file("file.txt");

    // skip first line
    file.ignore(numeric_limits<streamsize>::max(), '\n');

    // print next N characters
    const int N = 15;
    char s[N];
    file.get(s, N);
    cout << s << "\n";

    return 0;
}


Right after reading all the characters of line 1?

Joking aside ... how about matching your read size to the block size of the filesystem. You've still got to seek past that first line, but that's going to be the most efficient way to do it

If you knew the actual length of the first line, you could just seek past it without reading the data.


I think seek and then read char by char is good, if you know lower bound of your lines, seek to that bound then start to read upto the next line (for line 2 of 3 line file).

0

精彩评论

暂无评论...
验证码 换一张
取 消