I would like to sort a file on more fields. A sample tab separated file is:
a 1 1.0
b 2 0.1
c 3 0.3
a 4 0.001
c 5 0.5
a 6 0.01
b 7 0.01
a 8 0.35
b 9 2.3
c 10 0.1
c 11 1.0
b 12 3.1
a 13 2.1
And i would like to have it sorted alphabetically by field 1 (with -d
), and when field1 is the same, sort by fi开发者_如何学运维eld 3 (with the -g
option).
A didn't succeed in doing this. My attemps were (with a real TAB character instead of <TAB>
):
cat tst | sort -t"<TAB>" -k1 -k3n
cat tst | sort -t"<TAB>" -k1d -k3n
cat tst | sort -t"<TAB>" -k3n -k1d
None of these are working. I'm not sure if sort is even able to do this. I'll write a script for workaround, so I'm just curious whether there is a solution using only sort.
The manual shows some examples.
In accordance with zseder's comment, this works:
sort -t"<TAB>" -k1,1d -k3,3g
Tab should theoretically work also like this sort -t"\t"
.
If none of the above work to delimit by tab, this is an ugly workaround:
TAB=`echo -e "\t"`
sort -t"$TAB"
Here is a Python script that you might use as a starting point:
#!/usr/bin/env python2.6
import sys
import string
def main():
fname = sys.argv[1]
data = []
with open(fname, "rt") as stream:
for line in stream:
line = line.strip()
a, b, c = line.split()
data.append((a, int(b), float(c)))
data.sort(key=my_key)
print data
def my_key(item):
a, b, c = item
return c, lexicographical_key(a)
def lexicographical_key(a):
# poor man's attempt, should use Unicode classification etc.
return a.translate(None, string.punctuation)
if __name__ == "__main__":
main()
精彩评论