开发者

Getting the maximum value from dictionary

开发者 https://www.devze.com 2023-02-08 19:02 出处:网络
I\'m facing problem with this. I have 10,000 rows in my dictionary and this is one of the rows Example: A (8) C (4) G (48419) T (2) when printed out

I'm facing problem with this. I have 10,000 rows in my dictionary and this is one of the rows

Example: A (8) C (4) G (48419) T (2) when printed out

I'd like to get 'G' as an answer, since it has the highest value.

I'm currently using Python 2.4 and I have no idea how to solve this as I'm quite new in Python.

Thanks a lot for any开发者_Python百科 help given :)


Here's a solution that

  1. uses a regexp to scan all occurrences of an uppercase letter followed by a number in brackets
  2. transforms the string pairs from the regexp with a generator expression into (value,key) tuples
  3. returns the key from the tuple that has the highest value

I also added a main function so that the script can be used as a command line tool to read all lines from one file and the write the key with the highest value for each line to an output file. The program uses iterators, so that it is memory efficient no matter how large the input file is.

import re
KEYVAL = re.compile(r"([A-Z])\s*\((\d+)\)")

def max_item(row):
    return max((int(v),k) for k,v in KEYVAL.findall(row))[1]

def max_item_lines(fh):
    for row in fh:
        yield "%s\n" % max_item(row)

def process_file(infilename, outfilename):
    infile = open(infilename)
    max_items = max_item_lines(infile)
    outfile = open(outfilename, "w")
    outfile.writelines(max_items)
    outfile.close()

if __name__ == '__main__':
    import sys
    infilename, outfilename = sys.argv[1:]
    process_file(infilename, outfilename)

For a single row, you can call:

>>> max_item("A (8) C (4) G (48419) T (2)")
'G'

And to process a complete file:

>>> process_file("inputfile.txt", "outputfile.txt")

If you want an actual Python list of every row's maximum value, then you can use:

>>> map(max_item, open("inputfile.txt"))


max(d.itervalues())

This will be much faster than say d.values() as it is using an iterable.


Try the following:

st = "A (8) C (4) G (48419) T (2)" # your start string
a=st.split(")")
b=[x.replace("(","").strip() for x in a if x!=""]
c=[x.split(" ") for x in b]
d=[(int(x[1]),x[0]) for x in c]
max(d) # this is your result.


Use regular expressions to split the line. Then for all the matched groups, you have to convert the matched strings to numbers, get the maximum, and figure out the corresponding letter.

import re
r = re.compile('A \((\d+)\) C \((\d+)\) G \((\d+)\) T \((\d+)\)')
for line in my_file:
  m = r.match(line)
  if not m:
    continue # or complain about invalid line
  value, n = max((int(value), n) for (n, value) in enumerate(m.groups()))
  print "ACGT"[n], value


row = "A (8) C (4) G (48419) T (2)"

lst = row.replace("(",'').replace(")",'').split() # ['A', '8', 'C', '4', 'G', '48419', 'T', '2']

dd = dict(zip(lst[0::2],map(int,lst[1::2]))) # {'A': 8, 'C': 4, 'T': 2, 'G': 48419} 

max(map(lambda k:[dd[k],k], dd))[1] # 'G'
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号