开发者

Python regex multiple search

开发者 https://www.devze.com 2023-03-09 12:39 出处:网络
I need to search a string for multiple words. import re words = [{\'word\':\'test1\', \'case\':False}, {\'word\':\'test2\', \'case\':False}]

I need to search a string for multiple words.

import re

words = [{'word':'test1', 'case':False}, {'word':'test2', 'case':False}]

status = "test1 test2"

for w in words:
    if w['case']:
        r = re.compile("\s#?%s" % w['word'], re.IGNORECASE|re.MULTILINE)
    else:
        r = re.compile("\s#?%s" % w['word'], re.MULTILINE)
    if r.search(status):
        print "Found word %s" % w['word']

For some reason, this will only ever find "test2" and never "test1". Why is this?

I know I can use | delimita开发者_开发百科ted searches but there could be hundreds of words which is why I am using a for loop.


There is no space before test1 in status, while your generated regular expressions require there to be a space.

You can modify the test to match either after a space or at the beginning of a line:

for w in words:
    if w['case']:
        r = re.compile("(^|\s)#?%s" % w['word'], re.IGNORECASE|re.MULTILINE)
    else:
        r = re.compile("(^|\s)#?%s" % w['word'], re.MULTILINE)
    if r.search(status):
        print "Found word %s" % w['word']


As Martijn pointed out, there's no space before test1. But also your code doesn't properly handle the case when a word is longer. Your code would find test2blabla as an instance of test2, and I'm not sure if that is what you want.

I suggest using word boundary regex \b:

for w in words:
    if w['case']:
        r = re.compile(r"\b%s\b" % w['word'], re.IGNORECASE|re.MULTILINE)
    else:
        r = re.compile(r"\b%s\b" % w['word'], re.MULTILINE)
    if r.search(status):
        print "Found word %s" % w['word']

EDIT:

I should've pointed out that if you really want to allow only (whitespace)word or (whitespace)#word format, you cannot use \b.

0

精彩评论

暂无评论...
验证码 换一张
取 消