Python - Match Words in Text File to Dictionary and Manipulate Value_问答_开发者

Python - Match Words in Text File to Dictionary and Manipulate Value

开发者 https://www.devze.com 2023-01-31 21:39 出处：网络

I have a dictionary where the keys are simple words and the values ar开发者_JS百科e a score.I want to calculate a score based upon the frequency of the word and the score (value) stored in the diction

相关专题：python

I have a dictionary where the keys are simple words and the values ar开发者_JS百科e a score. I want to calculate a score based upon the frequency of the word and the score (value) stored in the dictionary compared to mathed words in a file (or string). For example, suppose my text was:

"Dogs are great pets and hamsters are bad pets. That is why I want a dog"

My dictionary is:

Dict = {'dogs' : 5, 'hampsters' : -2}

Then I would want to calculate a score of 8 ((2x5)-2 = 8). I can find occurences in the dictionary with

    for key in Dict: 
    m = re.findall(key, READ , re.IGNORECASE)

but I haven't been able to access the value of the key in a useful manner.

Any help is greatly appreciated.

Thanks, Scott

EDIT: Steve V inspired the following, which is rather nicer:

sentence = "...".split()
score = sum(sentence.count(word) * score for word, score in scores.items())

The obligatory one-liner:

>>> s = "Dogs are great pets and hamsters are bad pets. That is why I want a dog."
>>> scores = {'dogs': 5, 'hamsters': -2}
>>> import collections
>>> sum(scores.get(word.lower(), 0) * freq for word, freq in collections.Counter(s.split()).items())
3

and split up:

>>> sum = 0
>>> counts = collections.Counter(s.split())
>>> for word, freq in counts.items():
...     sum += scores.get(word.lower(), 0) * freq
...
>>> sum
3

Notable features:

The score isn't 8 (as you claimed above) but 3, because the word dogs only appears once in the string you gave. If you want to count the word dog twice, you will need a (much) more complicated algorithm, probably interfacing with a pluralisation library to handle cases like child -> children and man -> men. This will not be easy or necessarily correct.
I've included .lower() to ignore capitalisation in the string you gave. If you don't want that, just remove the call.
You misspelt "hamster" :p.

Use katrielalex's answer if possible, it's cleaner than mine. If you don't have Python 2.7 (like me), this may work for you:

sentence = "Dogs are great pets and hamsters are bad pets. That is why I want a dog"

scores = {'dog' : 5, 'hamster' : -2} 

occurrences = {}

for key in scores: 
  m = re.findall(key, sentence , re.IGNORECASE)
  occurrences[key] = len(m)

totalScore = 0

for word in occurrences:
  totalScore += scores.get(word.lower(), 0) * occurrences[word]

print totalScore

I did "dogs" -> "dog" in your scores dictionary, on the assumption that it was a typo. If you change it back, your result will be 3 without pluralization.

this should work:

mtext ="Dogs are great pets and hamsters are bad pets. That is why I want a dog" for key in Dict: p = re.compile('dog', re.IGNORECASE) NuOfDogs=len(p.findall(mtext)) #returns number of occurences

Another variation of katrielalex's answer for people stuck with Python 2.6,

put this snippet in a file (counter.py for instance): http://code.activestate.com/recipes/576611/

then you can use the following code:

from counter import Counter

counts = Counter(text.split())
for word, freq in counts.items():
    sum += scores.get(word.lower(), 0) * freq 
...

Pretty much the same except it works with older Python's versions.

Python - Match Words in Text File to Dictionary and Manipulate Value

精彩评论

关注公众号

热门标签

图文推荐

Python - Match Words in Text File to Dictionary and Manipulate Value

更多 问答 相关资讯：

精彩评论

关注公众号

热门标签

图文推荐

更多问答相关资讯：