How to write regular expression where we can find all words which are started by specified string. For ex-
a = "asasasa sasDRasas dr.klklkl DR.klklklkl Dr klklklkklkl"
Here I want to fetch all words which are starting by dr
using开发者_如何转开发 ignorecase. I tried but all functions results where dr
is found in word not start of the word.
Thanks in advance.
You can use \b
to find word boundaries, and the re.IGNORECASE
flag to search case-insensitively.
import re
a = "asasasa sasDRasas dr.klklkl DR.klklklkl Dr klklklkklkl"
for match in re.finditer(r'\bdr', a, re.IGNORECASE):
print 'Found match: "{0}" at position {1}'.format(match.group(0), match.start())
This will output:
Found match: "dr" at position 18 Found match: "DR" at position 28 Found match: "Dr" at position 40
Here, the pattern \bdr
matches dr, but only if it is found at the start of a word. This will also yield matches for strings like driving. If you only want to find dr as unique word, use \bdr\b
.
I use re.finditer()
to scan through the search string and yield every match for dr in a loop. The re.IGNORECASE
flag causes dr
to also match DR
, Dr
and dR
.
@Ferdinand Beyer's answer shows how to do it by regex. But you can easily achieve that with string functions:
>>> a
'asasasa sasDRasas dr.klklkl DR.klklklkl Dr klklklkklkl'
>>> cleaned = "".join(" " if i in string.punctuation else i for i in a)
>>> cleaned
'asasasa sasDRasas dr klklkl DR klklklkl Dr klklklkklkl'
>>> [word for word in cleaned.split() if word.lower().startswith("dr")]
['dr', 'DR', 'Dr']
>>> string_to_search_in
'this a a dr.seuse dr.brown dr. oz dr noone'
>>> re.compile('\b*?dr.?\s*?\w+', re.IGNORECASE).findall(string_to_search_in)
['dr.seuse', 'dr.brown', 'dr. oz', 'dr noone']
Yet another solution.
The expression will search and return the exact and starting with words from a string matched with a string variable.
import re
txt = "this a a dr.seuse dr.brown dr. oz dr noone"
suggtxt= "dr."
w_regex = r"\b"+re.escape(suggtxt)+r"+\S*"
x = re.findall(w_regex, txt, re.IGNORECASE)
print(x)
Output:
['dr.seuse', 'dr.brown', 'dr.']
精彩评论