Regular expression for remove html links [duplicate]_问答_开发者

Regular expression for remove html links [duplicate]

开发者 https://www.devze.com 2023-04-08 13:10 出处：网络

This question already has answers here: 开发者_StackOverflow中文版 Closed 11 years ago. Possible Duplicate:

相关专题：regex

This question already has answers here: 开发者_StackOverflow中文版 Closed 11 years ago.

Possible Duplicate:
Regular expression for parsing links from a webpage?
RegEx match open tags except XHTML self-contained tags

i need a regular expression to strip html <a> tags , here is sample:

<a href="xxxx" class="yyy" title="zzz" ...> link </a>

should be converted to

 link

I think you're looking for: </?a(|\s+[^>]+)>

Answers given above would match valid html tags such as <abbr> or <address> or <applet> and strip them out erroneously. A better regex to match only anchor tags would be

</?a(?:(?= )[^>]*)?>

Here's what I would use:

</?a\b[^>]*>

You're going to have to use this hackish solution iteratively, and it won't probably even work perfectly for complicated HTML:

<a(\s[^>]*)?>.*?(</a>)?

Alternatively, you can try one of the existing HTML sanitizers/parsers out there.

HTML is not a regular language; any regex we give you will not be 'correct'. It's impossible. Even Jon Skeet and Chuck Norris can't do it. Before I lapse into a fit of rage, like @bobince [in]famously once did, I'll just say this: