开发者

Sorting a counted list in Python

开发者 https://www.devze.com 2023-03-31 10:56 出处:网络
(I am brand new to any kind of programming so please be as specific as you can when you answer) Problem: I have written a program to solve pythonchallenge.com level 2. The program works but the result

(I am brand new to any kind of programming so please be as specific as you can when you answer) Problem: I have written a program to solve pythonchallenge.com level 2. The program works but the results are messy. I want to sort the results of the character count into a nice looking list. When I try to sort the results of the character count using sorted() it removes all the counts and just gives me a list of the characters that were in my string. I need to be able to keep the ability to see how much of each character was in my file. Anyway here is the code:

countstring = open('pagesource.txt').read()

charcount = {}

for x in countstring:
    charcount[x] = charcount.get(x, 0) + 1

print charcount

this is what i get in cmd:

>>> {'\n': 1219, '!': 6079, '#': 6115, '%': 6104, '$': 6046, '&': 6043, ')': 6186, '
(': 6154, '+': 6066, '*': 6034, '@': 6157, '[': 6108, ']': 6152, '_': 6112, '^':
 6030, 'a': 1, 'e': 1, 'i': 1, 'l': 1, 'q': 1, 'u': 1, 't': 1, 'y': 开发者_开发知识库1, '{': 6046
, '}': 6105}

if I add a sorted() function such as print sorted(charcount) to it I get this in cmd:

>>> ['\n', '!', '#', '$', '%', '&', '(', ')', '*', '+', '@', '[', ']', '^', '_', 'a'
, 'e', 'i', 'l', 'q', 't', 'u', 'y', '{', '}']

Thanks for your solutions and if you can take the time to add comments to your code explaining what everything does I would greatly appreciate it!


You should really use the Counter class instead of reinventing your own wheel.

charcount is a dictionary, and dictionaries have no implicit sort order. Therefore, we'll have to convert it to a list, which can be sorted. Each entry in that list will be a tuple of count and character.

charcount.items() already gives us a list that looks like [('\n', 1219), ('!', 6079)]. Unfortunately, if we would sort this list, it would sort by character first and then (if characters were ever equal) by count instead of the other way round. Therefore, we need a key function to tell sort to look at count first, and then (if counts are equal) the character. Fortunately, our key function is really simple; it just swaps around the tuple:

lambda (char,count): (count, char)

Alternatively, we could use a list comprehension to swap the values, to get something like: [('\n', 1219), ('!', 6079)], then sort, and then swap the values again.

charcount_list = sorted(charcount.items(), key=lambda (char,count):(count, char))

charcount_list will now be:

[('a', 1), ('e', 1), ('i', 1), ('l', 1), ('q', 1), ('t', 1), ('u', 1), ('y', 1),
 ('\n', 1219), ('^', 6030), ('*', 6034), ('&', 6043), ('$', 6046), ('{', 6046),
 ('+', 6066), ('!', 6079), ('%', 6104), ('}', 6105), ('[', 6108), ('_', 6112),
 ('#', 6115), (']', 6152), (' (', 6154), ('@', 6157), (')', 6186)]

If you want the reverse order, simply specify the reverse=True argument to sorted.


>>> from operator import itemgetter
>>> sorted(charcount.items(), key=itemgetter(1))
[('a', 1), ('e', 1), ('i', 1), ('l', 1), ('q', 1), ('u', 1), ('t', 1), ('y', 1), ('\n', 1219), ('^', 6030), ('*', 6034), ('&', 6043), ('$', 6046), ('{', 6046), ('+', 6066), ('!', 6079), ('%', 6104), ('}', 6105), ('[', 6108), ('_', 6112), ('#', 6115), (']', 6152), (' (', 6154), ('@', 6157), (')', 6186)]


charcount is a dict (dictionary). Iterating a dictionary iterates over it's keys, that's why sorted() results in a sorted list of keys.

You need to get list of items then sort it by the second value:

sorted(charcount.items(), key=lambda t: t[1])


Dictionaries ( what {} means) are unordered collections. Which means you can't sort them in any kind of meaningful way. I suggest storing the information as a list of tuples [(), ...] and then sorting them based on that.

foo = [('a', 123), ('b', 345)]

def key_function(x):
    return x[1]

sorted_list = sorted(foo, key_function)
print sorted_list

As you can see, sorted takes an optional second parameter. The purpose of that parameter is to provide a function that tells sorted how to sort something. All you're doing is breaking down the information in each tuple in the list to provide a value that can be ordered, since you can't really order a list of tuples in any meaningful way.

Make sense?

It can also be written like: print sorted(foo, key=lambda (x,y): y)

lambda just means an inline function with no name, and it allows you to break down the tuple in a different way.

You can see how this works by doing print [y for (x,y) in sorted_list]

You can even redefine the key function from before like this:

def key_function(x):
    x,y = x
    return y

BTW, I only put in the parentheses before for clarity. If you're not defining a function then the comma is the tuple constructor.


sorted(charcount.items(), key=lambda item: item[1])


Dictionary is iterated by key, so you get a sorted list of keys when you pass the dictionary to sorted. Sort the dictionary's item tuples by value to get a list of sorted tuples.

sorted_charcount = sorted(charcount.items(), key=lambda item: item[1])

If you're using Python 2.7+, then you can use the list of tuples to initialize an OrderedDict, which will maintain the sorted order of item tuples.

0

精彩评论

暂无评论...
验证码 换一张
取 消