开发者

Python - Match Words in Text File to Dictionary and Manipulate Value

开发者 https://www.devze.com 2023-01-31 21:39 出处:网络
I have a dictionary where the keys are simple words and the values ar开发者_JS百科e a score.I want to calculate a score based upon the frequency of the word and the score (value) stored in the diction

I have a dictionary where the keys are simple words and the values ar开发者_JS百科e a score. I want to calculate a score based upon the frequency of the word and the score (value) stored in the dictionary compared to mathed words in a file (or string). For example, suppose my text was:

"Dogs are great pets and hamsters are bad pets. That is why I want a dog"

My dictionary is:

Dict = {'dogs' : 5, 'hampsters' : -2}

Then I would want to calculate a score of 8 ((2x5)-2 = 8). I can find occurences in the dictionary with

    for key in Dict: 
    m = re.findall(key, READ , re.IGNORECASE)

but I haven't been able to access the value of the key in a useful manner.

Any help is greatly appreciated.

Thanks, Scott


EDIT: Steve V inspired the following, which is rather nicer:

sentence = "...".split()
score = sum(sentence.count(word) * score for word, score in scores.items())

The obligatory one-liner:

>>> s = "Dogs are great pets and hamsters are bad pets. That is why I want a dog."
>>> scores = {'dogs': 5, 'hamsters': -2}
>>> import collections
>>> sum(scores.get(word.lower(), 0) * freq for word, freq in collections.Counter(s.split()).items())
3

and split up:

>>> sum = 0
>>> counts = collections.Counter(s.split())
>>> for word, freq in counts.items():
...     sum += scores.get(word.lower(), 0) * freq
...
>>> sum
3

Notable features:

  • The score isn't 8 (as you claimed above) but 3, because the word dogs only appears once in the string you gave. If you want to count the word dog twice, you will need a (much) more complicated algorithm, probably interfacing with a pluralisation library to handle cases like child -> children and man -> men. This will not be easy or necessarily correct.

  • I've included .lower() to ignore capitalisation in the string you gave. If you don't want that, just remove the call.

  • You misspelt "hamster" :p.


Use katrielalex's answer if possible, it's cleaner than mine. If you don't have Python 2.7 (like me), this may work for you:

sentence = "Dogs are great pets and hamsters are bad pets. That is why I want a dog"

scores = {'dog' : 5, 'hamster' : -2} 

occurrences = {}

for key in scores: 
  m = re.findall(key, sentence , re.IGNORECASE)
  occurrences[key] = len(m)

totalScore = 0

for word in occurrences:
  totalScore += scores.get(word.lower(), 0) * occurrences[word]

print totalScore

I did "dogs" -> "dog" in your scores dictionary, on the assumption that it was a typo. If you change it back, your result will be 3 without pluralization.


this should work:

mtext ="Dogs are great pets and hamsters are bad pets. That is why I want a dog" for key in Dict: p = re.compile('dog', re.IGNORECASE) NuOfDogs=len(p.findall(mtext)) #returns number of occurences


Another variation of katrielalex's answer for people stuck with Python 2.6,

put this snippet in a file (counter.py for instance): http://code.activestate.com/recipes/576611/

then you can use the following code:

from counter import Counter

counts = Counter(text.split())
for word, freq in counts.items():
    sum += scores.get(word.lower(), 0) * freq 
...

Pretty much the same except it works with older Python's versions.

0

精彩评论

暂无评论...
验证码 换一张
取 消