开发者

Find lines beginning with same string and keep last occurance

开发者 https://www.devze.com 2023-03-22 15:40 出处:网络
I have this data: E 71484666NC 1201011060240260 387802-1227810102225052313D 0 1G5 E 71484666NC 1201011060240263 387902-1227910130010021300D 0 1A5

I have this data:

E 71484666NC 1201011060240260 387802-1227810  1022    25   0   5   2   313D 0 1G5
E 71484666NC 1201011060240263 387902-1227910  1300    10   0   2   1   300D 0 1A5
E 10115693AK 1201011060617450 658160-1517007   021 10  0 896  71   4   131L 2 AA2
E 10310002PR 0201011060102315 191509 -664820 39726  5  5 935  50  46 21282D 5 0hn

I need to find lines starting with same first 12 characters. If there are multiples, I need to delete previous occurrences and only keep the last one. So it should be like this:

E 71484666NC 1201011060240263 387902-1227910  1300    10   0   2   1   300D 0 1A5
E 10115693AK 1201011060617450 658160-1517007   021 10  0 896  71   4   131L 2 AA2
E 10310002P开发者_StackOverflow社区R 0201011060102315 191509 -664820 39726  5  5 935  50  46 21282D 5 0hn

Note: In most cases characters after the first 12 do not match... So checking duplicate lines is not an option.

Note: Need to preserve the order.


from collections import OrderedDict

lines = OrderedDict()
for line in file:
    lines[line[0:12]] = line

This will preserve the order of the lines while eliminating duplicates.

Edit: This version of OrderedDict works on Python 2.4, 2.5, and 2.6.


from collections import OrderedDict

mydata = """E 71484666NC 1201011060240260 387802-1227810  1022    25   0   5   2   313D 0 1G5
E 71484666NC 1201011060240263 387902-1227910  1300    10   0   2   1   300D 0 1A5
E 10115693AK 1201011060617450 658160-1517007   021 10  0 896  71   4   131L 2 AA2
E 10310002PR 0201011060102315 191509 -664820 39726  5  5 935  50  46 21282D 5 0hn"""

datalines = mydata.split('\n')
uniques = OrderedDict((x[:12],x[12:]) for x in datalines)
final = [x+y for x,y in uniques.items()]

for x in final:
  print x

This produces:

E 71484666NC 1201011060240263 387902-1227910  1300    10   0   2   1   300D 0 1A5
E 10115693AK 1201011060617450 658160-1517007   021 10  0 896  71   4   131L 2 AA2
E 10310002PR 0201011060102315 191509 -664820 39726  5  5 935  50  46 21282D 5 0hn


Use a dictionary, taking the first 12 characters as a key:

mydict = {}
for line in file:
    key = line[:12]
    value = line
    mydict[key] = line

this automatically overrides all previous entries.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号