开发者

Python or command line utility - sort and filter file?

开发者 https://www.devze.com 2023-03-30 21:50 出处:网络
Given data of the form: a b 1.1 c d 2.3 b a 1.1 Is it possible to sort such a file based on the thired column and remove lines where the entry in the third column is duplicated, such that the outpu

Given data of the form:

a b 1.1
c d 2.3
b a 1.1

Is it possible to sort such a file based on the thired column and remove lines where the entry in the third column is duplicated, such that the output will be:

a b 1.1
c d 2.3

or,开发者_Go百科

c d 2.3
b a 1.1

.

I am capable of using only python, R or command line utilities to perform this task on a set of very large files.

Thanks!


Unix sort should be able to do the work for you:

cat file | sort -u -k3,3n
a b 1.1
c d 2.3
cat file | sort -u -k3,3rn
c d 2.3
a b 1.1


f = open('text.txt','rb')
filter = []
rows = []
for line in f:
    line = line.replace('\r\n','')
    data = line.split(' ')
    if len(data) >= 3:
        if not data[2] in filter:
            filter.append(data[2])
            rows.append(data)
f.close()

f = open('output.txt','wb')
for row in rows:
    f.write(row[0] + ' ' + row[1] + ' ' + row[2] + '\r\n')
f.close()
0

精彩评论

暂无评论...
验证码 换一张
取 消