开发者

What's a PHP regex to target certain URLs?

开发者 https://www.devze.com 2023-01-28 14:44 出处:网络
I have some basic HTML which I am calling str_replace() on, I need to append all URLs found within an HTML string with a \'generate_book.php?link=\', but I need to exclude any external links, eg;

I have some basic HTML which I am calling str_replace() on, I need to append all URLs found within an HTML string with a 'generate_book.php?link=', but I need to exclude any external links, eg;

<a href="gst/3.html开发者_高级运维">Link</a> -- this should become -- <a href="generate_book.php?link=gst/3.html"></a>

<a href="http://example.com">Link</a> -- this should be left alone

Your brain powa is appreciated!


You'll want to use a look-ahead at the beginning to make sure it does not match HTTP or HTTPS. You could also add mailto if you are worried about it.

$str = preg_replace("/(?<=href=\")(?!http:\/\/|https:\/\/)([^\"]+)/i", "generate_book.php?link=$1", $str);

This regex also uses a look-behind ( the (?<=href=\")) so that it doesn't actually match the href=".

Warnings:

  • Need to be aware of which URL schemes will be in the HTML besides HTTP and HTTPS, if any.
  • Some tags like the link tag also have an href attribute. Make sure you aren't replacing these. If you need to match only A tags by using Regex, your regex complexity will grow considerably and still won't really be safe.
  • Regex Eval is much less efficient and unsafe, but if you need URL encoding, you can attempt to URL encode it at replace time like the second return of the other answer does.
  • Overall, Regex is not necessarily the best solution for this. You might be better off with an HTML parser...


Give this a try:

$str = preg_replace(
    "(href=\"([^\"]+)\")ie",
    "if(substr('$1',0,7) == 'http://')
        return stripslashes('$1');
     else
        return 'generate_book.php?link='.urlencode(stripslashes('$1'));",
    $str);
0

精彩评论

暂无评论...
验证码 换一张
取 消