开发者

In Python, how do I search a flat file for the closest match to a particular numeric value?

开发者 https://www.devze.com 2023-01-03 09:02 出处:网络
have file data of format 3.343445 1 3.54564 1 4.345535 1 2.453454 1 and so on upto 1000 lines and i have number given such as a=2.44443 for the given file i need to find the row number of the numb

have file data of format

3.343445 1  
3.54564 1  
4.345535 1  
2.453454 1

and so on upto 1000 lines and i have number given such as a=2.44443 for the given file i need to find the row number of the numbers in file which is most close to the given number "a" how can i do this i am presently doing by loading whole file into list and comparing each element开发者_Go百科 and finding the closest one any other better faster method?

my code:i need to ru this for different file each time around 20000 times so want a fast method

p=os.path.join("c:/begpython/wavnk/",str(str(str(save_a[1]).replace('phone','text'))+'.pm'))
        x=open(p , 'r')
        for i in range(6):
            x.readline()

        j=0
        o=[]
        for line in x:

            oj=str(str(line).rstrip('\n')).split(' ')
            o=o+[oj]

            j=j+1


        temp=long(1232332)
        end_time=save_a[4]

        for i in range((j-1)):
            diff=float(o[i][0])-float(end_time)
            if diff<0:
                diff=diff*(-1)
            if temp>diff:
                temp=diff
                pm_row=i


>>> gen = (float(line.partition(' ')[0]) for line in open(fname))
>>> min(enumerate(gen), key=lambda x: abs(x[1] - a))
(3, 2.453454)


If the file isn't sorted, no, there is no faster method.

Actually, let me rephrase: the fastest algorithm is to go through the file line by line and compare the first number on each line with your "target value," and save the line number where the difference is smallest. But from your description, it sounds like your implementation is inefficient. You don't need to load the whole file into memory, Python allows you to iterate through it loading a line at a time. Like so:

a = 2.44443
min_line = 0
min_diff = Infinity
with open('file.txt', 'r') as f:
    for i, line in enumerate(f):
        diff = abs(float(line.split()[0]) - a)
        if diff < min_diff:
            min_line = i
            min_diff = diff

EDIT: This assumes that you're only going to be searching the file for one value of a. If you're going to be repeatedly searching for several different values of a, then sorting the file and doing a binary search as other answers suggest becomes quicker.


Retrieve all the numbers and use bisect.insort to store them in a sorted list (or just throw them in any order and sort yourself); then use bisect to easily find the next higher and next lower number, and take the closer of the two.

This approach (that depends on an already-sorted list) is algorithmically much more efficient than iterating over the entire unsorted list each time you need to find a "close" number.


Here's one suggestion. After loading the data into a list, sort it in ascending order. Check the value against the last item in the list, then you know it's not in the list if greater than the last. Then start checking against each value if in the list. Stop checking once you get to a value higher then the "a" value. Then you can compare "a" to those two last values to see which was closer.

Be sure to store the row number in your list when you originally scan in the data. That preserves it for you to retrieve it after the sort.


a=2.44443
closest = None
f = open('somefile.txt','r')
theLines = f.readlines()  #or for really large files   theLines = f.xreadlines() 
#VALIDATE: I'm asumming at least one file
closest = float(theLines.iter().next().split()[0])
for line in theLines:
    b, c = line.split();
    b = float(b)
    if (abs(a - b) < abs(a - closest)):
        closest = b
f.close()
print "The closest is ", b
0

精彩评论

暂无评论...
验证码 换一张
取 消