开发者

In Python, how do you conserve grouping when you sort by one value and then another?

开发者 https://www.devze.com 2023-03-11 22:30 出处:网络
Data looks like this: Idx     score     group 5        0.85     Europe

Data looks like this:

Idx     score     group

5        0.85     Europe

8        0.77     Australia

12      0.70     S.America

13      0.71     Australia

42      0.82     Europe

45      0.90     Asia

65      0.91     Asia

73     开发者_JAVA百科 0.72     S.America

77      0.84     Asia

Needs to look like this:

Idx     score     group

65     0.91     Asia

77     0.84     Asia

45     0.73     Asia

12     0.87     S.America

73     0.72     S.America

5       0.85     Europe

42     0.82     Europe

8       0.83     Australia

13     0.71     Australia

See how Asia has the highest score and it shows me all of Asia's scores, then it's followed by the group which has the 2nd highest score and so on? I need to do this in Python. It's very different than sorting by one element and then sorting by another. Please help. Sorry if this question is redundant. I barely know how to ask it, let alone search for it.

I had it as a dictionary so that dict = {5:[0.85,Europe],8:[0.77,Australia]...} And I made a function that tried to parse the data:

def sortResults(dict):
   newDict = {}
   for k,v in dict.items():
      if v[-1] in newDict:
         sorDic[v[-1]].append((k,float(v[0]),v[1]))
      else:
         newDict[v[-1]] = [(k,float(v[0]),v[1])]
   for k in newDict.keys():
      for resList in newDict[k]:
         resList = sorted(resList,key=itemgetter(1),reverse=True)
   return sorDic

It says the float is unsubscriptable...I'm just confused.


I would just populate a dictionary with the maximum per group, and then sort on group maximum followed by individual score. Like this:

data = [
  (5 , 0.85, "Europe"),
  (8 , 0.77, "Australia"),
  (12, 0.70, "S.America"),
  (13, 0.71, "Australia"),
  (42, 0.82, "Europe"),
  (45, 0.90, "Asia"),
  (65, 0.91, "Asia"),
  (73, 0.72, "S.America"),
  (77, 0.84, "Asia")
]

maximums_by_group = dict()

for indx, score, group in data:
    if group not in maximums_by_group or maximums_by_group[group] < score:
        maximums_by_group[group] = score

data.sort(key=lambda e: (maximums_by_group[e[2]], e[1]), reverse=True)

for indx, score, group in data:
    print indx, score, group

This produces the expected output of

65 0.91 Asia
77 0.84 Asia
45 0.73 Asia
12 0.87 S.America
73 0.72 S.America
5 0.85 Europe
42 0.82 Europe
8 0.83 Australia
13 0.71 Australia


I think there's a better way to iterate than what i have here, but this works:

from operator import itemgetter

dataset = [
    { 'idx': 5, 'score': 0.85, 'group': 'Europe' },
    { 'idx': 8, 'score': 0.77, 'group': 'Australia' },
    { 'idx': 12, 'score': 0.70, 'group': 'S.America' },
    { 'idx': 13, 'score': 0.71, 'group': 'Australia' },
    { 'idx': 42, 'score': 0.82, 'group': 'Europe' },
    { 'idx': 45, 'score': 0.90, 'group': 'Asia' },
    { 'idx': 65, 'score': 0.91, 'group': 'Asia' },
    { 'idx': 73, 'score': 0.72, 'group': 'S.America' }
]

score_sorted = sorted(dataset, key=itemgetter('score'), reverse=True)

group_score_sorted = []
groups_completed = []
for score in score_sorted:
    group_name = score['group']
    if not group_name in groups_completed:
        groups_completed.append(group_name)

        for group in score_sorted:
            if group['group'] = group_name:
                group_score_sorted.append(group)

#group_score_sorted now contains sorted list


I think the easiest way is to separate first by groups and then doing the sort in two steps (first sort on max of group, second sort on score inside group).

data = [[ 5, 0.85, "Europe"],
        [ 8, 0.77, "Australia"],
        [12, 0.70, "S.America"],
        [13, 0.71, "Australia"],
        [42, 0.82, "Europe"],
        [45, 0.90, "Asia"],
        [65, 0.91, "Asia"],
        [73, 0.72, "S.America"],
        [77, 0.84, "Asia"]]

groups = {}
for idx, score, group in data:
    try:
        groups[group].append((idx, score, group))
    except KeyError:
        groups[group] = [(idx, score, group)]

for group in sorted((group for group in groups.keys()),
                    key = lambda g : -max(x[1] for x in groups[g])):
    for idx, score, group in sorted(groups[group], key = lambda g : -g[1]):
        print idx, score, group

The final result is

65 0.91 Asia
45 0.9  Asia
77 0.84 Asia
 5 0.85 Europe
42 0.82 Europe
 8 0.77 Australia
13 0.71 Australia
73 0.72 S.America
12 0.7  S.America

that is different from what you provided, but for the results in your question I think you have a typo because the score 0.87 for S.America is not present anywhwere in the input data.


The easiest way to do this is to dump the data into a list, because python dictionaries are unsorted. Then use the native timsort algorithm in python, which keeps runs or groupings during sorts.

So your code would be something like this:

data = [[ 5, 0.85, "Europe"],
        [ 8, 0.77, "Australia"],
        [12, 0.70, "S.America"],
        [13, 0.71, "Australia"],
        [42, 0.82, "Europe"],
        [45, 0.90, "Asia"],
        [65, 0.91, "Asia"],
        [73, 0.72, "S.America"],
        [77, 0.84, "Asia"]]

data.sort(key=lambda x: x[1], reverse=True)
data.sort(key=lambda x: x[2].upper())

This will produce:

[65, 0.91, 'Asia']
[45, 0.90, 'Asia']
[77, 0.84, 'Asia']
[8, 0.77, 'Australia']
[13, 0.71, 'Australia']
[5, 0.85, 'Europe']
[42, 0.82, 'Europe']
[73, 0.72, 'S.America']
[12, 0.70, 'S.America']


I like itertools and operator:

from itertools import groupby, imap
from operator import itemgetter

def sort_by_max(a_list):
    index, score, group = imap(itemgetter, xrange(3))
    a_list.sort(key=group)
    max_index = dict(
        (each, max(imap(index, entries)))
            for each, entries in groupby(a_list, group)
    )
    a_list.sort(key=lambda x:(-max_index[group(x)], -score(x)))

Used like this:

the_list = [
    [5, 0.85, 'Europe'],
    [8, 0.77, 'Australia'],
    [12, 0.87, 'S.America'],
    [13, 0.71, 'Australia'],
    [42, 0.82, 'Europe'],
    [45, 0.90, 'Asia'],
    [65, 0.91, 'Asia'],
    [73, 0.72, 'S.America'],
    [77, 0.84, 'Asia']
]
sort_by_max(the_list)
for each in the_list:
    print '{0:2} : {1:<4} : {2}'.format(*each)

gives:

65 : 0.91 : Asia
45 : 0.9  : Asia
77 : 0.84 : Asia
12 : 0.87 : S.America
73 : 0.72 : S.America
 5 : 0.85 : Europe
42 : 0.82 : Europe
 8 : 0.77 : Australia
13 : 0.71 : Australia

[EDIT]

Come to think of it, I also like defaultdict and max:

from collections import defaultdict

def sort_by_max(a_list):
    max_index = defaultdict(int)
    for index, score, group in a_list:
        max_index[group] = max(index, max_index[group])
    a_list.sort(key=lambda (index, score, group):(-max_index[group], -score))
0

精彩评论

暂无评论...
验证码 换一张
取 消