开发者

Regexp: remove all tags from string except one kind of tags

开发者 https://www.devze.com 2023-03-23 15:42 出处:网络
I have such s开发者_如何转开发tring <p>test <span class=\\\"match\\\">match</span> <span class=\\\"testtes\\\">dddddd</span></p>

I have such s开发者_如何转开发tring

<p>test <span class=\"match\">match</span> <span class=\"testtes\">dddddd</span></p>

I want to get string without tags. But I want to save highlighting by class "match":

test <span class=\"match\">match</span> dddddd

If I want to just remove all tags I substitute all substrings that satisfied regexp /<\/?[^>]*>/ by empty string. But what regexp should I use in my special case?

UPD: The algorithm is: if you see and some sentence without tags and then then you shouldn't remove these spans; otherwise you should remove all tags


I can could do someting like this

<\/?(?![^>]*class=\\"match)[^>]*>

This would preserve the opening tag and result in this

test <span class=\"match\">match dddddd

See it here on Regexr

But how should I find the matching closing tag?

<p>test <span class=\"match\">match</span> <span class=\"testtes\">dddddd</span></p>
                                   ^^^^^^^          or the next one?     ^^^^^^^

Regex can't know which closing tag belongs to the opening <span> tag that contains that class. I don't have the possibility to find matching closing tags. So its not a good idea to do this using regex.

I am quite sure the language you are using has an html parser that can be used to do this task.

0

精彩评论

暂无评论...
验证码 换一张
取 消