I have a big text inside a var on php, im looking for a good and fast method to retrive all the links inside this text and store them into开发者_高级运维 an array.
The text is plain ascii and the links are the common ones like http://thesite.com
or http://www.thesite.com
. Thanks for any help.
$text = 'Lorem ipsum http://thesite.com dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor
incididunt https://www.thesite.com ut labore et dolore magna aliqua. Ut http://www.thesite.com enim ad minim veniam,';
$pattern = '!(https?://[^\s]+)!'; // refine this for better/more specific results
if (preg_match_all($pattern, $text, $matches)) {
list(, $links) = ($matches);
print_r($links);
}
Search google for any "URL Regex", then insert it into the following code:
preg_match_all("/your url regex here/",$text,$matches);
all matches are now stored as an array in $matches[0].
Well these regexes here are all nice and so, however, they grow over time and in the end, things might look like a little bit different. It's not all my credit nor is it all ideal, this one is with code from a community project having a some years on it's back and I don't want to say it's ideal, however it suits some needs. Compiled it up into a single function:
echo make_clickable('test http://www.google.com/');
/**
* make_clickable
*
* make a text clickable
*
* @param string $text to make clickable
* @param callback $url callback to process URLs
* @return string clickable text
* @author hakre and contributors
* @license GPL
*/
function make_clickable($text, $url = null) {
if (null === $url)
$callback_url = function($url) {return $url;};
else
$callback_url = $url;
$ret = ' ' . $text;
// urls
$save = ini_set('pcre.recursion_limit', 10000);
$retval = preg_replace_callback('#(?<!=[\'"])(?<=[*\')+.,;:!&$\s>])(\()?([\w]+?://(?:[\w\\x80-\\xff\#%~/?@\[\]-]{1,2000}|[\'*(+.,;:!=&$](?![\b\)]|(\))?([\s]|$))|(?(1)\)(?![\s<.,;:]|$)|\)))+)#is', function($matches) use ($callback_url)
{
$url = $matches[2];
$suffix = '';
/** Include parentheses in the URL only if paired **/
while ( substr_count( $url, '(' ) < substr_count( $url, ')' ) ) {
$suffix = strrchr( $url, ')' ) . $suffix;
$url = substr( $url, 0, strrpos( $url, ')' ) );
}
$url = $callback_url($url);
if ( empty($url) )
return $matches[0];
return $matches[1] . "<a href=\"$url\">$url</a>" . $suffix;
}, $ret);
if (null !== $retval )
$ret = $retval;
ini_set('pcre.recursion_limit', $save);
// web ftp
$ret = preg_replace_callback('#([\s>])((www|ftp)\.[\w\\x80-\\xff\#$%&~/.\-;:=,?@\[\]+]+)#is', function ($matches) use ($callback_url)
{
$ret = '';
$dest = $matches[2];
$dest = 'http://' . $dest;
$dest = $callback_url($dest);
if ( empty($dest) )
return $matches[0];
// removed trailing [.,;:)] from URL
if ( in_array( substr($dest, -1), array('.', ',', ';', ':', ')') ) === true ) {
$ret = substr($dest, -1);
$dest = substr($dest, 0, strlen($dest)-1);
}
return $matches[1] . "<a href=\"$dest\">$dest</a>$ret";
}, $ret);
// email
$ret = preg_replace_callback('#([\s>])([.0-9a-z_+-]+)@(([0-9a-z-]+\.)+[0-9a-z]{2,})#i', function($matches)
{
$email = $matches[2] . '@' . $matches[3];
return $matches[1] . "<a href=\"mailto:$email\">$email</a>";
}, $ret);
$ret = preg_replace("#(<a( [^>]+?>|>))<a [^>]+?>([^>]+?)</a></a>#i", "$1$3</a>", $ret);
$ret = trim($ret);
return $ret;
}
You have to use regular expressions. preg and ereg are both interesting in PHP, considering that ereg is easier to use, but slower.
Here is a simple preg call that will get URLs from $text.
preg_match_all("/https?:\/\/[^\s]+/i", $text, $urls);
$urls is an array of your URLs.
精彩评论