I'm completely new to Python and while trying various random bits and pieces I've struck upon a problem that I believe I've "solved", but the code doesn't feel right - I strongly suspect there is going to be a better way to get the desired result.
FYI - I'm using whatever the latest version of Python 3 is, on Windows.
Problem definition
Briefly, what I'm doing is sorting a list of pairs, such that the pairs containing the elements that appears in the fewest pairs are sorted to the front.
The pairs are in the form [i,j]
with 0 <= i <= j < n
, where n
is a known maximum value for the elements. There are no duplicate pairs within the list.
The count of an element i
is a simple count of the number of pairs (not pair elements) in the forms [i,j]
,[j,i]
and [i,i]
where j
is any value that results in a valid pair.
In the sorted result, a pair [i,j]
should appear before a pair [k,l]
if count(i) < count(k)
or count(i) == count(k)
and count(j) < count(l)
(If count(j) == count(l)
the two can be in either order - I'm not bothered about the sort being stable, would be a bonus though).
In the sorted result, a pair [i,j]
should appear before a pair [k,l]
if
min(count(i),count(j)) &l开发者_开发问答t; min(count(k),count(l))
or
min(count(i),count(j)) == min(count(k),count(l))
and max(count(i),count(j)) < max(count(k),count(l))
.
In otherwords, if the pair is [0,1]
and 1
has a count of one, but 0
has a count of four hundred, the pair should still be at (or at least very near) the front of the list - they need sorting by the least frequent element in the pair.
Here's a contrived example I've built:
input [[0,0],[1,2],[1,4],[2,2],[2,3],[3,3],[3,4]]
Here's the individual element counts and the source pairs they come from:
0: 1 [0,0]
1: 2 [1,2],[1,4]
2: 3 [1,2],[2,2],[2,3]
3: 3 [2,3],[3,3],[3,4]
4: 2 [1,4],[3,4]
And here's the result, along with the pair scores:
output: [[0,0],[1,4],[1,2],[3,4],[2,2],[2,3],[3,3]]
scores: 1 1-2 1-3 2-3 3 3 3
Here, 0
has a count of one (it appears in one pair, albeit twice) so comes first. 1
has a count of two, so appears second - with [1,4]
before [1,2]
because 4
has a count of two and 2
has a count of three, et cetera.
My current solution
As said, I believe this implimentation works accurately, but it just feels that there must be a better way to go about doing this. Anyway, here's what I've got so far:
#my implementation uncommented to reduce post size, see history for comments
def sortPairList( data , n ):
count = []
for i in range(0,n):
count.append( 0 )
#count up the data
for p in data:
count[p[0]] += 1
if p[1] != p[0]:
count[p[1]] += 1
maxcount = 0
for i in range(0,n):
if count[i] > maxcount:
maxcount = count[i]
def elementFrequency(p):
if count[ p[0] ] < count[ p[1] ]:
return count[ p[0] ] + float(count[ p[1] ]) / (maxcount+1)
else:
return count[ p[1] ] + float(count[ p[0] ]) / (maxcount+1)
data.sort( key=elementFrequency )
Any suggestions on a more "Python" way of doing this?
Or anything that's wrong with my current attempt?New Test Case (see answer's comments)
input: [[0,0],[0,3],[0,5],[0,7],[1,1],[1,2],[1,8],[2,4],[2,5],[3,4],[3,5],[3,9],[4,4],[4,7],[4,8],[6,8],[7,7],[7,9],[8,9]]
expected: [[6,8],[1,1],[1,2],[2,5],[0,5],[1,8],[3,5],[3,9],[7,9],[8,9],[2,4],[0,0],[0,3],[0,7],[7,7],[3,4],[4,7],[4,8],[4,4]]
I would probably use a Counter (needs Python ≥2.7 or ≥3.1) for tallying.
from collections import Counter
from itertools import chain
def sortPairList2(data):
tally = Counter(chain(*map(set, data)))
data.sort(key=lambda x: sorted(tally[i] for i in x))
Note that:
You can create an anonymous function with
lambda
. For example,>>> c = 4 >>> a = lambda p: p - c >>> a(7) 3
The sort key need not be a number. Anything comparable can be used as the return value of the key function. In my code, a
list
is used for ordering.There are many simpler idioms in Python for your original code.
- The
count
can be initialized usingcount = [0] * n
instead of that loop. - The
maxcount
can be obtained with themax
function.maxcount = max(count)
- The
List comprehension is used a lot in Python. If your target is to transform an iterable into another iterable, prefer comprehension over loops.
>>> n = 4
>>> freqs = {i: sum(i in j for j in inp) for i in range(n+1)}
>>> def key(x):
a, b = x
return min(freqs[a], freqs[b]), max(freqs[a], freqs[b])
>>> sorted(inp, key=key)
P.S. Please note that input
is a bad name for a variable as it shadows built-in.
While KennyTM solution works I tried to do it myself.
My solution precomputes frequencies and stores it in dictionary where str(n)
is key. I had some trouble with changing compare function known from Python2 to key used with Python3, but I found recipe at ActiveState code
item_cnt = {}
def icount(n):
return item_cnt[str(n)]
def add_item(n):
sn = str(n)
try:
item_cnt[sn] += 1
except KeyError:
item_cnt[sn] = 1
# sort callback
def cmp_items(ij, kl):
i, j = ij
k, l = kl
if icount(i) < icount(k) or icount(i) == icount(k) and icount(j) < icount(l):
return -1
return 1
input = [[0,0],[1,2],[1,4],[2,2],[2,3],[3,3],[3,4]]
# count all items
for (i, j) in input:
add_item(i)
add_item(j)
# works with Python 2.x
#input.sort(cmp_items)
# works with Python2.6 and Python 3.x
# to convert compare function to key look at:
# http://code.activestate.com/recipes/576653-convert-a-cmp-function-to-a-key-function/
input.sort(key=cmp_to_key(cmp_items))
print(input)
Similar to KennyTM's solution, but for Python 2.5 or greater:
import collections
def sort_by_occurence(sequences):
tally = collections.defaultdict(int)
for sequence in sequences:
for item in sequence:
tally[item] += 1
sequences.sort(key=lambda x:map(tally.get, x))
pair_list = [[0,0],[1,2],[1,4],[2,2],[2,3],[3,3],[3,4]]
sort_by_occurence(pair_list)
print pair_list
精彩评论