I have some basic HTML which I am calling str_replace()
on, I need to append all URLs found within an HTML string with a 'generate_book.php?link=
', but I need to exclude any external links, eg;
<a href="gst/3.html开发者_高级运维">Link</a>
-- this should become -- <a href="generate_book.php?link=gst/3.html"></a>
<a href="http://example.com">Link</a>
-- this should be left alone
Your brain powa is appreciated!
You'll want to use a look-ahead at the beginning to make sure it does not match HTTP or HTTPS. You could also add mailto
if you are worried about it.
$str = preg_replace("/(?<=href=\")(?!http:\/\/|https:\/\/)([^\"]+)/i", "generate_book.php?link=$1", $str);
This regex also uses a look-behind ( the (?<=href=\")
) so that it doesn't actually match the href="
.
Warnings:
- Need to be aware of which URL schemes will be in the HTML besides HTTP and HTTPS, if any.
- Some tags like the
link
tag also have anhref
attribute. Make sure you aren't replacing these. If you need to match onlyA
tags by using Regex, your regex complexity will grow considerably and still won't really be safe. - Regex Eval is much less efficient and unsafe, but if you need URL encoding, you can attempt to URL encode it at replace time like the second return of the other answer does.
- Overall, Regex is not necessarily the best solution for this. You might be better off with an HTML parser...
Give this a try:
$str = preg_replace(
"(href=\"([^\"]+)\")ie",
"if(substr('$1',0,7) == 'http://')
return stripslashes('$1');
else
return 'generate_book.php?link='.urlencode(stripslashes('$1'));",
$str);
精彩评论