I need to search a string for multiple words.
import re
words = [{'word':'test1', 'case':False}, {'word':'test2', 'case':False}]
status = "test1 test2"
for w in words:
if w['case']:
r = re.compile("\s#?%s" % w['word'], re.IGNORECASE|re.MULTILINE)
else:
r = re.compile("\s#?%s" % w['word'], re.MULTILINE)
if r.search(status):
print "Found word %s" % w['word']
For some reason, this will only ever find "test2" and never "test1". Why is this?
I know I can use | delimita开发者_开发百科ted searches but there could be hundreds of words which is why I am using a for loop.
There is no space before test1
in status
, while your generated regular expressions require there to be a space.
You can modify the test to match either after a space or at the beginning of a line:
for w in words:
if w['case']:
r = re.compile("(^|\s)#?%s" % w['word'], re.IGNORECASE|re.MULTILINE)
else:
r = re.compile("(^|\s)#?%s" % w['word'], re.MULTILINE)
if r.search(status):
print "Found word %s" % w['word']
As Martijn pointed out, there's no space before test1
. But also your code doesn't properly handle the case when a word is longer. Your code would find test2blabla
as an instance of test2
, and I'm not sure if that is what you want.
I suggest using word boundary regex \b
:
for w in words:
if w['case']:
r = re.compile(r"\b%s\b" % w['word'], re.IGNORECASE|re.MULTILINE)
else:
r = re.compile(r"\b%s\b" % w['word'], re.MULTILINE)
if r.search(status):
print "Found word %s" % w['word']
EDIT:
I should've pointed out that if you really want to allow only (whitespace)word
or (whitespace)#word
format, you cannot use \b
.
精彩评论