Ho开发者_开发问答w to extract a list of sub strings based on some patterns in python?
for example,
str = 'this {{is}} a sample {{text}}'.
expected result : a python list which contains 'is' and 'text'
>>> import re
>>> re.findall("{{(.*?)}}", "this {{is}} a sample {{text}}")
['is', 'text']
Assuming "some patterns" means "single words between double {}'s":
import re
re.findall('{{(\w*)}}', string)
Edit: Andrew Clark's answer implements "any sequence of characters at all between double {}'s"
You can use the following:
res = re.findall("{{([^{}]*)}}", a)
print "a python list which contains %s and %s" % (res[0], res[1])
Cheers
A regex-based solution is fine for your example, although I would recommend something more robust for more complicated input.
import re
def match_substrings(s):
return re.findall(r"{{([^}]*)}}", s)
The regex from inside-out:
[^}]
matches anything that's not a '}'
([^}]*)
matches any number of non-} characters and groups them
{{([^}]*)}}
puts the above inside double-braces
Without the parentheses above, re.findall
would return the entire match (i.e. ['{{is}}', '{{text}}']
. However, when the regex contains a group, findall will use that, instead.
You could use a regular expression to match anything that occurs between {{
and }}
. Will that work for you?
Generally speaking, for tagging certain strings in a large body of text, a suffix tree will be useful.
精彩评论