开发者

Removing 'http://' from link via REGEX

开发者 https://www.devze.com 2023-04-07 01:31 出处:网络
What I would like to do is remove the \"http://\" part of these autogenerated links, below is an example of it.

What I would like to do is remove the "http://" part of these autogenerated links, below is an example of it.

http://google.com/search?gc...

Here are the regexes I am using in PHP to generate these links from a URL.

    $patterns_sp[5] = '~([\S]+)~';                          
    $replaces_sp[5] = '<a href=\1 target="_blank">\1<br/>';

    $patterns_sp[6] = '~(?<=\>)([\S]{1,25})[^\s]+~';        
    $replaces_sp[6] = '\1...</a><br/>';

When these patterns are run on a URL like this:

http://www.google.com/search?gcx=c&ix=c1&sourceid=chrome&ie=UTF-8&q=regex

the REGEX gives me:

   <a href="http://www.google.com/search?gcx=c&ix=c1&sourceid=chrome&ie=UTF-8&q=regex" target="_blank">http://google.com/search?gc...</a>

Where I 开发者_如何学JAVAam stuck:

There is no obvious reason why I cannot modify the fourth line of code to read like this:

    $patterns_sp[6] = '~(?<=\>http\:\/\/)([\S]{1,25})[^\s]+~';  

However, the REGEX still seems to capture the "http://" part of the address, thus making a long list of these very redundant looking. What I am left with is the same thing as in the first example.


Replace...

$patterns_sp[5] = '~([\S]+)~';                          

...with...

$patterns_sp[5] = '~^(?:https?|ftp):([\S]+)~';

Then you can access the protocol-less version with $1 and the whole link with $0.

Optionally, you can remove a leading protocol with something like...

preg_replace('/^(?:https?|ftp):/', '', $str);


I suggest not writing your own regex, instead have a look at http://php.net/manual/en/function.parse-url.php

Retrieve the components of the URL, then compose a new version that only contains the parts you want.

0

精彩评论

暂无评论...
验证码 换一张
取 消