开发者

"catching" links in regex using php ignoring inline js

开发者 https://www.devze.com 2023-03-22 09:46 出处:网络
I\'m stuck trying to make a regex in PHP that catches the link and its content from a html page (which I have no control over) and replaces it with a link of mine.

I'm stuck trying to make a regex in PHP that catches the link and its content from a html page (which I have no control over) and replaces it with a link of mine.

i.e.:

<a style="position:absolute;more_styles:more;" href="开发者_Go百科http://www.google.co.il/" class="something">This is the content</a>

Becomes:

<a style="position:absolute;more_styles:more;" href="my_function('http://www.google.co.il/')" class="something">This is the content</a>

This is the regex that I wrote:

$content = preg_replace('|<a(.*?)href=[\"\'](.*?)[\"\'][^>]*>(.*?)</a>|i','$3',$content);

This works well with all the links except links like:

<a href="http://google.co.il" onclick="if(MSIE_VER()>=4){this.style.behavior='url(#default#homepage)';this.setHomePage('http://www.google.co.il')}" class='brightgrey rightbar' style='font-size:12px'><b>Make me the home page!</b></a>

Obviously, the regexp stops at "MSIE_VER()>" because of the "[^>]*" part and i get the wrong content when I use "$3".

I tried almost every option to make this work but no luck.

Any thoughts?

Thank you all in advance..


First of all your code is trying to do something different that to add my_function - it tries to remove the starting tag and replace it with url only. There are several ways to acheieve your declared goal (i.e. substituing my_function to all hrefs) , the most pragmafic would be:

$content = preg_replace('|href=[\"\'](.*?)[\"\']|i',"href=\"my_function('$1')\"",$content);

if you need more prudent approach than I would use

$content = preg_replace('|(<a.*?)href=[\"\'](.*?)[\"\'](.*?</a>)|i',"$1href=\"my_function('$2')\"$3",$content);

last but not least if you need removing tag rather than what you have written, let me know there is million ways to do it.


By default .* will take evryting it can - eg. it takes onclick argument, because regex is still valid - replace "." with [^\"] - it will tell regexp to take evrything excluding " ( which cannot be in URL )

$content = preg_replace('|<a(.*?)href=[\"\']([^"]*?)[\"\'][^>]*>(.*?)</a>|i','$3',$content);
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号