开发者

search algorithm on large files

开发者 https://www.devze.com 2023-01-06 16:36 出处:网络
I need help in deciding with search algorithm to use for searching large files. here is what I am doing. Lets say file consists of time range t1 to t2. (t2>t1)

I need help in deciding with search algorithm to use for searching large files. here is what I am doing. Lets say file consists of time range t1 to t2. (t2>t1)

I need to get file offsets (fseek) of:

  1. time t3 that is bigger than t1
  2. time t4 that is smaller than time t2

    | ------| ---|----------------|
    
    t1      t3   t4              t2
    

Naive version is to iterate lines through whole file and return fseek when current time is t3, start from returned seek and iterate while current time is t4, return second fseek

Now lets say file is 100GB and i need to iterate while file just to get period of 2 seconds. Then this logic becomes too CPU and filesystem expensive. Looking for better solutions. Language in use is C. Lines are currently fixed size but I'd like to look into futur开发者_如何转开发e and deal with some algorithm that doesn't use fixed size lengths.


You can use a binary search if the times in the file are all sorted. Even better if the records in your file are of a fixed width, but you probably can make use of it even if they are not, with some work.


Since the values are fixed width, something like a binary search or an interpolation search sound like the best options. Also, if you plan on working with files in those size classes (100GB), you should consider using fgetpos/fsetpos due to the file size limits of fseek.

0

精彩评论

暂无评论...
验证码 换一张
取 消