unix sorting, with primary and secondary keys_问答_开发者

unix sorting, with primary and secondary keys

开发者 https://www.devze.com 2023-01-06 04:01 出处：网络

I would like to sort a file on more fields. A sample tab separated file is: a11.0 b20.1 c30.3 a40.001 c50.5

相关专题：bash sorting

I would like to sort a file on more fields. A sample tab separated file is:

a   1   1.0
b   2   0.1
c   3   0.3
a   4   0.001
c   5   0.5
a   6   0.01
b   7   0.01
a   8   0.35
b   9   2.3
c   10  0.1
c   11  1.0
b   12  3.1
a   13  2.1

And i would like to have it sorted alphabetically by field 1 (with -d), and when field1 is the same, sort by fi开发者_如何学运维eld 3 (with the -g option).

A didn't succeed in doing this. My attemps were (with a real TAB character instead of <TAB>):

cat tst | sort -t"<TAB>" -k1 -k3n
cat tst | sort -t"<TAB>" -k1d -k3n
cat tst | sort -t"<TAB>" -k3n -k1d

None of these are working. I'm not sure if sort is even able to do this. I'll write a script for workaround, so I'm just curious whether there is a solution using only sort.

The manual shows some examples.

In accordance with zseder's comment, this works:

sort -t"<TAB>" -k1,1d -k3,3g

Tab should theoretically work also like this sort -t"\t".

If none of the above work to delimit by tab, this is an ugly workaround:

TAB=`echo -e "\t"`
sort -t"$TAB"

Here is a Python script that you might use as a starting point:

#!/usr/bin/env python2.6

import sys
import string

def main():
    fname = sys.argv[1]
    data = []
    with open(fname, "rt") as stream:
        for line in stream:
            line = line.strip()
            a, b, c = line.split()
            data.append((a, int(b), float(c)))
    data.sort(key=my_key)
    print data


def my_key(item):
    a, b, c = item
    return c, lexicographical_key(a)


def lexicographical_key(a):
    # poor man's attempt, should use Unicode classification etc.
    return a.translate(None, string.punctuation)


if __name__ == "__main__":
    main()