开发者

What regex can I use to get the domain name from a url in Ruby?

开发者 https://www.devze.com 2023-01-08 21:53 出处:网络
I am trying to construct a regex to extract a domain give开发者_StackOverflow社区n a url. for: http://www.abc.google.com/

I am trying to construct a regex to extract a domain give开发者_StackOverflow社区n a url.

for:

http://www.abc.google.com/
http://abc.google.com/
https://www.abc.google.com/
http://abc.google.com/

should give:

abc.google.com


URI.parse('http://www.abc.google.com/').host
#=> "www.abc.google.com"

Not a regex, but probably more robust then anything we come up with here.

URI.parse('http://www.abc.google.com/').host.gsub(/^www\./, '')

If you want to remove the www. as well this will work without raising any errors if the www. is not there.


Don't know much about ruby but this regex pattern gives you the last 3 parts of the url excluding the trailing slash with a minumum of 2 characters per part.

([\w-]{2,}\.[\w-]{2,}\.[\w-]{2,})/$


you may be able to use the domain_name gem for this kind of work. From the README:

require "domain_name"
host = DomainName("a.b.example.co.uk")
host.domain         #=> "example.co.uk"


Your question is a little bit vague. Can you give a precise specification of what it is exactly that you want to do? (Preferable with a testsuite.) Right now, all your question says is that you want a method that always returns 'abc.google.com'. That's easy:

def extract_domain
  return 'abc.google.com'
end

But that's probably not what you meant …

Also, you say that you need a Regexp. Why? What's wrong with, for example, using the URI class? After all, parsing and manipulating URIs is exactly what it was made for!

require 'uri'

URI.parse('https://abc.google.com/').host # => 'abc.google.com'

And lastly, you say you are "trying to extract a domain", but you never specify what you mean by "domain". It looks you are sometimes meaning the FQDN and sometimes randomly dropping parts of the FQDN, but according to what rules? For example, for the FQDN abc.google.com, the domain name is google.com and the host name is abc, but you want it to return abc.google.com which is not just the domain name but the full FQDN. Why?

0

精彩评论

暂无评论...
验证码 换一张
取 消