Possible Duplicate:
Regular expression for parsing links from a webpage? RegEx match open tags except XHTML self-contained tags
i need a regular expression to strip html <a>
tags , here is sample:
<a href="xxxx" class="yyy" title="zzz" ...> link </a>
should be converted to
link
I think you're looking for: </?a(|\s+[^>]+)>
Answers given above would match valid html tags such as <abbr>
or <address>
or <applet>
and strip them out erroneously. A better regex to match only anchor tags would be
</?a(?:(?= )[^>]*)?>
Here's what I would use:
</?a\b[^>]*>
You're going to have to use this hackish solution iteratively, and it won't probably even work perfectly for complicated HTML:
<a(\s[^>]*)?>.*?(</a>)?
Alternatively, you can try one of the existing HTML sanitizers/parsers out there.
HTML is not a regular language; any regex we give you will not be 'correct'. It's impossible. Even Jon Skeet and Chuck Norris can't do it. Before I lapse into a fit of rage, like @bobince [in]famously once did, I'll just say this:
Use a HTML Parser.
(Whatever they're called.)
EDIT:
If you want to 'incorrectly' strip out </a>
s that don't have any <a>
s as well, do this:
</?[a\s]*[^>]*>
</?a.*?>
would work. Replace it with ''
精彩评论