开发者

PHP HTTP_HOST subdomain extraction given that a subdomain be a wildcard and contain more than one '.'

开发者 https://www.devze.com 2023-01-10 03:37 出处:网络
I\'m trying to extract the subdomain from the HTTP_HOST value. However I\'ve stumbled into a problem where if the subdomain has more than one dot in it it fails to match properly. Given that this is a

I'm trying to extract the subdomain from the HTTP_HOST value. However I've stumbled into a problem where if the subdomain has more than one dot in it it fails to match properly. Given that this is a script to run on multiple different domains and it could have an unlimited amount of dots, and the tld could be either 1 or 2 parts (and any length) - is there a practical way of correctly matching the subdomain, domain and tld in all situations?

So for example take the following HTTP_HOST values and what is required to be matched.

  • www.buggedcom.co.uk
    • Subdomain: www
    • Domain: buggedcom.co.uk
    • TLD: co.uk
  • www.buggedcom.com
    • Subdomain: www
    • Domain: buggedcom.com
    • TLD: com
  • test.buggedcom.co.uk
    • Subdomain: test
    • Domain: buggedcom.co.uk
    • TLD: co.uk
  • test.buggedcom.com
    • Subdomain: test
    • Domain: buggedcom.com
    • TLD: com
  • multi.sub.test.buggedcom.co.uk
    • Subdomain: multi.sub.test
    • Domain: buggedcom.co.uk
    • TLD: co.uk
  • multi.sub.test.buggedcom.com
    • Subdomain: multi.sub.test
    • Domain: buggedcom.com
    • TLD: com

I am presuming that the only way to accomplish this would be to load a list of tl开发者_Go百科ds, which allow possible I don't really want to do as this is at the start of a script and should really require heavy lifting like that.

Below is the current code.

define('HOST', isset($_SERVER['HTTP_HOST']) === true ? $_SERVER['HTTP_HOST'] : (isset($_SERVER['SERVER_ADDR']) === true ? $_SERVER['SERVER_ADDR'] : $_SERVER['SERVER_NAME']));
$domain_parts = explode('.', HOST); 
$domain_parts_count = count($domain_parts);
if($domain_parts_count > 1)
{   
    $sub_parts = array_splice($domain_parts, 0, $domain_parts_count-3);
    define('SUBDOMAIN', implode('.', $sub_parts));
    unset($sub_parts);
}
else
{
    define('SUBDOMAIN', '');
}
define('DOMAIN', implode('.', $domain_parts));
var_dump($domain_parts, SUBDOMAIN, DOMAIN);exit;

Just thought could mod_rewrite append the subdomain as a get param?


First of all I would explode(and use the first index in the array) on a slash just to be sure that the string ends with the TLD.

Then I would cut it with a preg_replace. This rexexp matches the domain+tld regardless of tld type. Beware however this would give a problem with 2&3 letter domains. But it should give a push to the right direction....

[a-zA-Z0-9]+\.(([a-zA-Z]{2,6})|([a-zA-Z]{2,3}\.[a-zA-Z]{2,3}))$

Edit: as pointed out: .museum is also possible, so edited the first pattern in the TLD part....

And of course TLD's like .UK could behave differently then co.uk ugh.. it's not that easy...


I think the solution to this is better handled by those trying to do the same thing... there's a bunch of better URL parsing functions in the comments to PHP docs for parse_url function that might work better: http://www.php.net/manual/en/function.parse-url.php


With preg_match, you can extract the subdomain and tld parts in one go, like this:

function get_domain_parts($domain) {
    $parts = array();
    $pattern = "/(.*)\.buggedcom\.(.*)/";
    if (preg_match($pattern, $domain, $parts) == 1) {
        return array($parts[1], $parts[2]);
    } else {
        return FALSE;
    }
}

$result = get_domain_parts("multi.sub.test.buggedcom.co.uk");
if ($result) {
    echo($result[0] . " and " . $result[1]); // multi.sub.test and co.uk   
}


Not to be nit-picky, but technically speaking .co.uk is a second level domain.

.uk is the "Country Code Top Level Domain" in that case, and the .co is for "Commercial Use" defined by the United Kingdom.

This might not answer your question though.

Wikipedia has a pretty complete list of TLD's, as you can see they only contain 1 "dot" followed by 1 "string".

0

精彩评论

暂无评论...
验证码 换一张
取 消