开发者

Regular expression for remove html links [duplicate]

开发者 https://www.devze.com 2023-04-08 13:10 出处:网络
This question already has answers here: 开发者_StackOverflow中文版 Closed 11 years ago. Possible Duplicate:
This question already has answers here: 开发者_StackOverflow中文版 Closed 11 years ago.

Possible Duplicate:

Regular expression for parsing links from a webpage?

RegEx match open tags except XHTML self-contained tags

i need a regular expression to strip html <a> tags , here is sample:

<a href="xxxx" class="yyy" title="zzz" ...> link </a>

should be converted to

 link


I think you're looking for: </?a(|\s+[^>]+)>


Answers given above would match valid html tags such as <abbr> or <address> or <applet> and strip them out erroneously. A better regex to match only anchor tags would be

</?a(?:(?= )[^>]*)?>


Here's what I would use:

</?a\b[^>]*>


You're going to have to use this hackish solution iteratively, and it won't probably even work perfectly for complicated HTML:

<a(\s[^>]*)?>.*?(</a>)?

Alternatively, you can try one of the existing HTML sanitizers/parsers out there.


HTML is not a regular language; any regex we give you will not be 'correct'. It's impossible. Even Jon Skeet and Chuck Norris can't do it. Before I lapse into a fit of rage, like @bobince [in]famously once did, I'll just say this:

Use a HTML Parser.

(Whatever they're called.)


EDIT:

If you want to 'incorrectly' strip out </a>s that don't have any <a>s as well, do this:

</?[a\s]*[^>]*>


</?a.*?> would work. Replace it with ''

0

精彩评论

暂无评论...
验证码 换一张
取 消