开发者

Mapping a flat text file

开发者 https://www.devze.com 2023-04-12 10:17 出处:网络
In a text file, lines are detected by \\n at the end of each line. For this purpose, it is necessary to read the entire file, and this is a big problem for large files (say 2GB). I am looking for a me

In a text file, lines are detected by \n at the end of each line. For this purpose, it is necessary to read the entire file, and this is a big problem for large files (say 2GB). I am looking for a method to read a single line without walking through the entire file (though I know it should be a complicated process).

  1. The first way I know is to use fseek() with offset; but it is not practical.
  2. Creating a flat file of key/value; but I am not sure if there is a way to avoid loading the entire into RAM (it should be something like reading an array in php).
  3. Alternatively, can we make some numbers at the beginning of each line to be read. I mean, is it possible to read the first digit开发者_Python百科s at the beginning of the line by skipping the line contents (going to the next line).

    768| line content is here
    769| another line
    770| something
    

If reading only the first digits, the total data which should be read is not much even for large files.


Do you need to read specific lines that can be indexed on line number?. If so just do a binary search. Read (say) 200 characters in the middle of the file to find out a line number. Then repeat in either of the halves until you get to the right line.


I think there are no simple way to do what you want. Records have variable length and no length could be determined in advance, right?

If file is always the same (or at least not modified frequently), I'd put it to database, or at least create index file (record number: offset) and use that fseek()


Alternatively you can index your text file initially and then proceed with your daily operation of picking up single file lines based on your index file. You can find how to index your text file here or here. Indexing a text file is no different from indexing a CSV or variable record file.

0

精彩评论

暂无评论...
验证码 换一张
取 消