Possible Duplicates:
How can I alter this regex to get the Youtube video id from a Youtube URL that doesn't specify the v parameter? What regex can I use to get the domain name from a url in Ruby? Improving regex for parsing YouTube / Vimeo URLs
What is the regex to validate that a string is a url to a youtube or vimeo video? I'm not so good with regular expressions. This is for a rails application.
For youtube:
yt_regexp = /^http:\/\/www\.youtube\.com\/watch\?v=([a-zA-Z0-9_-]*)/
You get the id of the video also:
>> yt_regexp.match("http://www.youtube.com/watch?v=foo")[1]
=> "foo"
For vimeo:
vimeo_regexp = /^http:\/\/www\.vimeo\.com\/(\d+)/
You can also extract the id using the same as before.
If you want to make "http://www." optional, you can use:
yt_regexp = /^(?:http:\/\/)?(?:www\.)?youtube\.com\/watch\?v=([a-zA-Z0-9_-]*)/
vimeo_regexp = /^(?:http:\/\/)?(?:www\.)?vimeo\.com\/(\d+)/
A regex is one way to get there, but not what I'd use. I prefer using a URL parser, like the built-in URI
or the Addressable::URI
gem. URLs can get messy, and, there are multiple ways a site can be designated in a URL that resolve and will connect to a particular host, but fail the usual "check for the host name" test.
require 'uri'
url = 'http://www.youtube.com/watch?v=_NaiiBkqOxE&feature=feedu'
uri = URI.parse(url)
uri.host # => "www.youtube.com"
A couple ways of doing it:
uri.host['youtube.com'] # => "youtube.com"
uri.host =~ /youtube\.com/ # => 4
!!uri.host['youtube.com'] # => true
!!(uri.host =~ /youtube\.com/) # => true
Usually our needs are more sophisticated, and we want to know what parameters are embedded in the URL, or what the path to the resource is. Split breaks the URL into its component pieces:
URI.split(url) # => ["http", nil, "www.youtube.com", nil, nil, "/watch", nil, "v=_NaiiBkqOxE&feature=feedu", nil]
Each of the pieces has a defined name, so it's common to break the URL down into elements in a hash. You can create a hash of all the parts for fast lookup:
parts = Hash[*[:scheme, :userinfo, :host, :port, :registry, :path, :opaque, :query, :fragment].zip(URI.split(url)).flatten]
parts # => {:scheme=>"http", :userinfo=>nil, :host=>"www.youtube.com", :port=>nil, :registry=>nil, :path=>"/watch", :opaque=>nil, :query=>"v=_NaiiBkqOxE&feature=feedu", :fragment=>nil}
Using Addressable::URI to do the same things:
require 'addressable/uri'
uri = Addressable::URI.parse('http://www.youtube.com/watch?v=_NaiiBkqOxE&feature=feedu')
uri.host # => "www.youtube.com"
parts = uri.to_hash
parts # => {:scheme=>"http", :user=>nil, :password=>nil, :host=>"www.youtube.com", :port=>nil, :path=>"/watch", :query=>"v=_NaiiBkqOxE&feature=feedu", :fragment=>nil}
Wikipedia's page on URL normalization shows a lot of examples of how URLs can vary, yet still point to the same resource. So, if your use is to only match the main domain for a site, then yes, you can use a simple regex, or even a substring search. When you get beyond that need you need to get more sophisticated in how you take the URL apart.
I'm not familiar with vimeo but youtube would be:
"http://www.youtube.com/watch?v=".+
Note the quote marks. You want exactly the format in between them, which is what they tell your regex engine. Otherwise you will get suprised by things like the periods and question mark in the entry. Then you get a random string which finishes off the url.
精彩评论