开发者

Extracting top-level domain names from list of website addresses

开发者 https://www.devze.com 2023-01-12 00:33 出处:网络
I have a list of web addresses such as listed below in my DB. I need to get the domain name from each address in the list.

I have a list of web addresses such as listed below in my DB.

I need to get the domain name from each address in the list.

  • http://en.wordpress.com/tag/1000-things-we-hate/
  • http://en.wordpress.com/tag/1019/
  • http://en.wordpress.com/tag/1030-am/
  • http://ww开发者_开发问答w.yahoo.com/index.html
  • http://www.msn.com/index.html


Here's a way to do it in Java:

String input = "http://en.wordpress.com/tag/1000-things-we-hate/";
// Assuming that all urls start with "http://"
int finish = input.indexOf("/", 7);
if(finish == -1)
{
  finish = input.length();
}
System.out.println(input.substring(7, finish));

Prints en.wordpress.com (I assume that is what you want?)


<?php
$url = "http://en.wordpress.com/tag/1000-things-we-hate/";
$bits = explode("/",$url);
$nextBits = explode(".",$bits[1]);
$count = count($nextBits);
$domain = $nextBits[$count-1].".".$nextBits[$count];
echo $domain;
?>


<?php
echo parse_url($url, PHP_URL_HOST);

That would return "en.wordpress.com". If you don't want subdomains (i.e. only "wordpress.com), then things are getting complicated. You would need something like http://www.dkim-reputation.org/regdom-libs/


Use the parse_url in PHP.

0

精彩评论

暂无评论...
验证码 换一张
取 消