开发者

Regular expression to get the main domain of a URL

开发者 https://www.devze.com 2023-01-29 12:24 出处:网络
I have never used regex before and I was wondering how to write a r开发者_运维知识库egular expression in PHP that gets the domain of the URL. For example:

I have never used regex before and I was wondering how to write a r开发者_运维知识库egular expression in PHP that gets the domain of the URL. For example: http://www.hegnar.no/bors/article488276.ece --> hegnar.no


You dont need to use regexp for this task.

Check PHP's built in function, parse_url http://php.net/manual/en/function.parse-url.php


Just use parse_url() if you are specifically dealing with URLs.

For example:

$url = "http://www.hegnar.no/bors/article488276.ece";
$url_u_want = parse_url($url, PHP_URL_HOST);

Docs

EDIT: To take out the www. infront, use:

$url_u_want = preg_replace("/^www\./", "", $url_u_want);


$page = "http://google.no/page/page_1.html";
preg_match_all("/((?:[a-z][a-z\\.\\d\\-]+)\\.(?:[a-z][a-z\\-]+))(?![\\w\\.])/", $page, $result, PREG_PATTERN_ORDER);

print_r($result);


$host = parse_url($url, PHP_URL_HOST);
$host = array_reverse(explode('.', $host));
$host = $host[1].'.'.$host[0];


See

PHP Regex for extracting subdomains of arbitrary domains

and

Javascript/Regex for finding just the root domain name without sub domains


This is the problem when you use parse_url, the $url with no .com or .net or etc then the result returned is bannedadsense, this mean returning true, the fact bannedadsense is not a domain.

$url = 'http://bannedadsense/isbanned'; // this url will return false in preg_match
//$url = 'http://bannedadsense.com/isbanned'; // this url will return domain in preg_match
$domain = parse_url($url, PHP_URL_HOST));
// return "bannedadsense", meaning this is right domain.

So that we need continue to check more a case with no dot extension (.com, .net, .org, etc)

if(preg_match("/^[a-zA-Z0-9][a-zA-Z0-9-]{1,61}[a-zA-Z0-9](?:\.[a-zA-Z]{2,})+$/i",$domain)) {
    echo $domain;
}else{
    echo "<br>";
    echo "false";
}
0

精彩评论

暂无评论...
验证码 换一张
取 消