开发者

Regex for template tag with attributes

开发者 https://www.devze.com 2023-02-03 17:01 出处:网络
I haven\'t found my answer after reading through all of these posts, so I\'m hoping one of you heavy hitter regex folks can help me out. I\'m trying to isolate the tag name and any attributes from the

I haven't found my answer after reading through all of these posts, so I'm hoping one of you heavy hitter regex folks can help me out. I'm trying to isolate the tag name and any attributes from the following string format:

{TAG:TYPE attr1="foo" attr2="bar" attr3="zing" attr4="zang" attr5="zoom" ...}

NOTE: in the above example, TAG will always be the same and TYPE will be one of several preset strings (e.g. share,print,dis开发者_Python百科play etc...). TAG and TYPE are uppercased only for the example but will not be case sensitive for real.


For the moment, let's assume that your attribute names and values, as well as your TAG and TYPE, are strictly alphanumeric. Parsing gets messier (and may not even be regular) if you could have " or = inside those strings.

With those caveats, here's a python regex that gets the job done:

>>> parse_regex=r'\{(?P<tag>\w+):(?P<type>\w+)(?P<attrs>(\s+\w+=\"\w+\")*)\}'
>>> m = re.match(parse_regex, str)
>>> m.group('tag')
'TAG'
>>> m.group('type')
'TYPE'
>>> m.group('attrs')
' attr1="foo" attr2="bar" attr3="zing" attr4="zang" attr5="zoom"'

At this point, you'd want to clean up the attributes into a friendly data structure. Since there could be arbitrarily many of them, it's going to be more convenient (and just as efficient) not to use regexps for this stage.

>>> [attr_str.split('=') for attr_str in m.group('attrs').split()]
[['attr1', '"foo"'], ['attr2', '"bar"'], ['attr3', '"zing"'], ['attr4', '"zang"'], ['attr5', '"zoom"']]
0

精彩评论

暂无评论...
验证码 换一张
取 消