I have blog data like:
This is foreign <a href="xyz.com">link</a>, this is my site's <a href="mysite.com">l开发者_开发知识库ink</a> and so on.
What I want is to do is filter the links of foreign sites, i.e "<a href="xyz.com">link</a>
". So that my final output is:
This is foreign link, this is my site's <a href="mysite.com">link</a> and so on.
I tried "preg_replace" but no pattern helped.
First of all, I have to agree with people who've already said that regex were not the right tool for HTML.
That said, if what you want to do is no more complex than replacing any and all occurences of
<a href="something.tld">foo</a>
with
foo
if something.tld is not your domain, then this should do the trick
preg_replace( '/<a href="http:\/\/(?!mysite.com)(.*?)>(.*?)<\/a>/',
'$2',
$mystring );
where $mystring is obviously the string you'd like to modify. However, this uses regex lookarounds, a pretty good giveaway that this was not meant to be done with regexes.
HTH
This shouldn't be done with regular expressions.
Try something like a DOM parser.
I don't know if you're using PHP, but this one very easy to use:
http://simplehtmldom.sourceforge.net/
Hope this helps.
You can use DOMDocument to find all link elements and just update the source that way. I wrote a little example of how to use DOMDocument to find all links. I use this method to rewrite links in some projects I've worked on. I'm sure it wouldn't take much effort to go further and delete the a tag and replace it with text if the url does not match your host.
I would strongly encourage you to use http://htmlpurifier.org/ , which will not only make it easy to write a link filter ( http://htmlpurifier.org/docs/enduser-uri-filter.html ) but also protect you from XSS attacks. If you aren't using a whitelisted HTML parser, you need to be treating user-supplied data as literal and escaping html special characters.
精彩评论