开发者

Python - extracting a list of sub strings

开发者 https://www.devze.com 2023-01-31 11:14 出处:网络
Ho开发者_开发问答w to extract a list of sub strings based on some patterns in python? for example,

Ho开发者_开发问答w to extract a list of sub strings based on some patterns in python?

for example,

str = 'this {{is}} a sample {{text}}'.

expected result : a python list which contains 'is' and 'text'


>>> import re
>>> re.findall("{{(.*?)}}", "this {{is}} a sample {{text}}")
['is', 'text']


Assuming "some patterns" means "single words between double {}'s":

import re

re.findall('{{(\w*)}}', string)

Edit: Andrew Clark's answer implements "any sequence of characters at all between double {}'s"


You can use the following:

res = re.findall("{{([^{}]*)}}", a)
print "a python list which contains %s and %s" % (res[0], res[1])

Cheers


A regex-based solution is fine for your example, although I would recommend something more robust for more complicated input.

import re

def match_substrings(s):
    return re.findall(r"{{([^}]*)}}", s)

The regex from inside-out:

[^}] matches anything that's not a '}'
([^}]*) matches any number of non-} characters and groups them
{{([^}]*)}} puts the above inside double-braces

Without the parentheses above, re.findall would return the entire match (i.e. ['{{is}}', '{{text}}']. However, when the regex contains a group, findall will use that, instead.


You could use a regular expression to match anything that occurs between {{ and }}. Will that work for you?

Generally speaking, for tagging certain strings in a large body of text, a suffix tree will be useful.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号