开发者

Search start of the word using regular expression

开发者 https://www.devze.com 2023-03-21 02:14 出处:网络
How to write regular expression where we can find all words which are started by specified string. For ex-

How to write regular expression where we can find all words which are started by specified string. For ex-

a = "asasasa sasDRasas dr.klklkl DR.klklklkl Dr klklklkklkl"

Here I want to fetch all words which are starting by dr using开发者_如何转开发 ignorecase. I tried but all functions results where dr is found in word not start of the word.

Thanks in advance.


You can use \b to find word boundaries, and the re.IGNORECASE flag to search case-insensitively.

import re

a = "asasasa sasDRasas dr.klklkl DR.klklklkl Dr klklklkklkl"
for match in re.finditer(r'\bdr', a, re.IGNORECASE):
    print 'Found match: "{0}" at position {1}'.format(match.group(0), match.start())

This will output:

Found match: "dr" at position 18
Found match: "DR" at position 28
Found match: "Dr" at position 40

Here, the pattern \bdr matches dr, but only if it is found at the start of a word. This will also yield matches for strings like driving. If you only want to find dr as unique word, use \bdr\b.

I use re.finditer() to scan through the search string and yield every match for dr in a loop. The re.IGNORECASE flag causes dr to also match DR, Dr and dR.


@Ferdinand Beyer's answer shows how to do it by regex. But you can easily achieve that with string functions:

>>> a
'asasasa sasDRasas dr.klklkl DR.klklklkl Dr klklklkklkl'
>>> cleaned = "".join(" " if i in string.punctuation else i for i in a)
>>> cleaned
'asasasa sasDRasas dr klklkl DR klklklkl Dr klklklkklkl'
>>> [word for word in cleaned.split() if word.lower().startswith("dr")]
['dr', 'DR', 'Dr']


>>> string_to_search_in
'this a a dr.seuse dr.brown dr. oz dr noone'
>>> re.compile('\b*?dr.?\s*?\w+', re.IGNORECASE).findall(string_to_search_in)
['dr.seuse', 'dr.brown', 'dr. oz', 'dr noone']


Yet another solution.

The expression will search and return the exact and starting with words from a string matched with a string variable.

import re

txt = "this a a dr.seuse dr.brown dr. oz dr noone"
suggtxt= "dr."
w_regex = r"\b"+re.escape(suggtxt)+r"+\S*"
x = re.findall(w_regex, txt,  re.IGNORECASE)
print(x)

Output:

['dr.seuse', 'dr.brown', 'dr.']
0

精彩评论

暂无评论...
验证码 换一张
取 消