I want to calculate how many lines contain a word matched with keywords I chosen. So I coded lik开发者_高级运维e this.
28 for each_keyword in keywords:
29 if each_keyword in text:
31 related_tweet_count += 1
32 print "related_tweet_count", related_tweet_count
33 print text
It performed very well. But it has a problem. For example, I have a keyword "flu" then it gives not only "flu" but also "influence". To solve this problem, I searched match word examples and fixed the code like this.
28 for each_keyword in keywords:
30 if re.search('\beach_keyword\b', text, re.I):
31 related_tweet_count += 1
32 print "related_tweet_count", related_tweet_count
33 print text
But it doesn't work. Please help me out!
You need to actually substitute each_keyword
into the regular expression. At the moment it's literally trying to match "each_keyword".
28 for each_keyword in keywords:
30 if re.search('\\b' + each_keyword + '\\b', text, re.I):
31 related_tweet_count += 1
32 print "related_tweet_count", related_tweet_count
33 print text
Alternatively do it without regular expressions and use more kw variations,
for keyword in keywords:
kw_list = [' '+keyword+',',' '+keyword+' ',' '+keyword+'.','. '+keyword]
for kw in kw_list:
if kw in text:
related_tweet_count += 1
精彩评论