开发者

Improving regex for parsing YouTube / Vimeo URLs

开发者 https://www.devze.com 2023-02-23 03:43 出处:网络
I\'ve made a function (in JavaScript) that takes an URL from either YouTube or Vimeo. It figures out the provider and ID for that particular video (demo: http://jsfiddle.net/csjwf/).

I've made a function (in JavaScript) that takes an URL from either YouTube or Vimeo. It figures out the provider and ID for that particular video (demo: http://jsfiddle.net/csjwf/).

function parseVideoURL(url) {

    var provider = url.match(/http:\/\/(:?www.)?(\w*)/)[2],
        id;

    if(provider == "youtube") {

        id = url.match(/http:\/\/(?:www.)?(\w*).com\/.*v=(\w*)/)[2];
    } else if (provider == "vimeo") {

        id = url.match(/http:\/\/(?:www.)?(\w*).com\/(\d*)/)[2];
    } else {开发者_JAVA百科
        throw new Error("parseVideoURL() takes a YouTube or Vimeo URL");    
    }
    return {
        provider : provider,
        id : id
    }
}

It works, however as a regex Novice, I'm looking for ways to improve it. The input I'm dealing with, typically looks like this:

http://vimeo.com/(id)
http://youtube.com/watch?v=(id)&blahblahblah.....

1) Right now I'm doing three separate matches, would it make sense to try and do everything in one single expression? If so, how?

2) Could the existing matches be more concise? Are they unnecessarily complex? or perhaps insufficient?

3) Are there any YouTube or Vimeo URL's that would fail being parsed? I've tried quite a few and so far it seems to work pretty well.

To summarize: I'm simply looking for ways improve the above function. Any advice is greatly appreciated.


Here's my attempt at the regex, which covers most updated cases:

function parseVideo(url) {
    // - Supported YouTube URL formats:
    //   - http://www.youtube.com/watch?v=My2FRPA3Gf8
    //   - http://youtu.be/My2FRPA3Gf8
    //   - https://youtube.googleapis.com/v/My2FRPA3Gf8
    // - Supported Vimeo URL formats:
    //   - http://vimeo.com/25451551
    //   - http://player.vimeo.com/video/25451551
    // - Also supports relative URLs:
    //   - //player.vimeo.com/video/25451551

    url.match(/(https?\/\/)(player.|www.)?(vimeo\.com|youtu(be\.com|\.be|be\.googleapis\.com))\/(video\/|embed\/|watch\?v=|v\/)?([A-Za-z0-9._%-]*)(\&\S+)?/);
    var type = null;
    if (RegExp.$3.indexOf('youtu') > -1) {
        type = 'youtube';
    } else if (RegExp.$3.indexOf('vimeo') > -1) {
        type = 'vimeo';
    }

    return {
        type: type,
        id: RegExp.$6
    };
}


Regex is wonderfully terse but can quickly get complicated.

http://jsfiddle.net/8nagx2sk/

function parseYouTube(str) {
    // link : //youtube.com/watch?v=Bo_deCOd1HU
    // share : //youtu.be/Bo_deCOd1HU
    // embed : //youtube.com/embed/Bo_deCOd1HU

    var re = /\/\/(?:www\.)?youtu(?:\.be|be\.com)\/(?:watch\?v=|embed\/)?([a-z0-9_\-]+)/i; 
    var matches = re.exec(str);
    return matches && matches[1];
}

function parseVimeo(str) {
    // embed & link: http://vimeo.com/86164897

    var re = /\/\/(?:www\.)?vimeo.com\/([0-9a-z\-_]+)/i;
    var matches = re.exec(str);
    return matches && matches[1];
}

Sometimes simple code is nicer to your fellow developers.

https://jsfiddle.net/vkg02mhp/1/

// protocol and www nuetral
function getVideoId(str, prefixes) {
  const cleaned = str.replace(/^(https?:)?\/\/(www\.)?/, '');
  for(const prefix of prefixes) {
    if (cleaned.startsWith(prefix))
      return cleaned.substr(prefix.length)
  }
  return undefined;
}

function getYouTubeId(url) {
  return getVideoId(url, [
    'youtube.com/watch?v=',
    'youtu.be/',
    'youtube.com/embed/'
  ]);
}

function getVimeoId(url) {
  return getVideoId(url, [
    'vimeo.com/'
  ]);
}

Which do you prefer to update?


I am not sure about your question 3), but provided that your induction on the url forms is correct, the regexes can be combined into one as follows:

/http:\/\/(?:www.)?(?:(vimeo).com\/(.*)|(youtube).com\/watch\?v=(.*?)&)/

You will get the match under different positions (1st and 2nd matches if vimeo, 3rd and 4th matches if youtube), so you just need to handle that.

Or, if you are quite sure that vimeo's id only includes numbers, then you can do:

/http:\/\/(?:www.)?(vimeo|youtube).com\/(?:watch\?v=)?(.*?)(?:\z|&)/

and the provider and the id will apprear under 1st and 2nd match, respcetively.


Here is my regex

http://jsfiddle.net/csjwf/1/


For Vimeo, Don't rely on Regex as Vimeo tends to change/update their URL pattern every now and then. As of October 2nd, 2017, there are in total of six URL schemes Vimeo supports.

https://vimeo.com/*
https://vimeo.com/*/*/video/*
https://vimeo.com/album/*/video/*
https://vimeo.com/channels/*/*
https://vimeo.com/groups/*/videos/*
https://vimeo.com/ondemand/*/*

Instead, use their API to validate vimeo URLs. Here is this oEmbed (doc) API which takes an URL, checks its validity and return a object with bunch of video information(check out the dev page). Although not intended but we can easily use this to validate whether a given URL is from Vimeo or not.

So, with ajax it would look like this,

var VIMEO_BASE_URL = "https://vimeo.com/api/oembed.json?url=";
var yourTestUrl = "https://vimeo.com/23374724";


$.ajax({
  url: VIMEO_BASE_URL + yourTestUrl,
  type: 'GET',
  success: function(data) {
    if (data != null && data.video_id > 0)
      // Valid Vimeo url
    else
      // not a valid Vimeo url
  },
  error: function(data) {
    // not a valid Vimeo url
  }
});


about sawa's answer :

a little update on the second regex :

/http:\/\/(?:www\.)?(vimeo|youtube)\.com\/(?:watch\?v=)?(.*?)(?:\z|$|&)/

(escaping the dots prevents from matching url of type www_vimeo_com/… and $ added…)

here is the same idea for matching the embed urls :

/http:\/\/(?:www\.|player\.)?(vimeo|youtube)\.com\/(?:embed\/|video\/)?(.*?)(?:\z|$|\?)/


FWIW, I just used the following to validate and parse both YouTube and Vimeo URLs in an app. I'm sure you could add parentheses to parse out the specific things you're looking for...

/^(?:https?:\/\/)?(?:www\.)?(?:youtu\.be\/|youtube\.com\/(?:embed\/|v\/|watch\?v=|watch\?.+&v=))((\w|-){11})(?:\S+)?$|^(https?:\/\/)?(www.)?(player.)?vimeo.com\/([a-z]*\/)*([0-9]{6,11})[?]?.*$/

^^ This is just a combination of 2 separate expressions using | (or) to join them. Here are the original 2 expressions separately:

/^(?:https?:\/\/)?(?:www\.)?(?:youtu\.be\/|youtube\.com\/(?:embed\/|v\/|watch\?v=|watch\?.+&v=))((\w|-){11})(?:\S+)?$/

/^(https?:\/\/)?(www.)?(player.)?vimeo.com\/([a-z]*\/)*([0-9]{6,11})[?]?.*$/

I'm no expert, but it seems to work according to Rubular. Hopefully this helps someone out in the future.


3) Your regex does not match https url's. I haven't tested it, but I guess the "http://" part would become "http(s)?://". Note that this would change the matching positions of the provider and id.


Just in case here is a php version

/*
* parseVideo
* @param (string) $url 
* mi-ca.ch 27.05.2016
* parse vimeo & youtube id
* format url for iframe embed 
* https://regex101.com/r/lA0fP4/1
*/

function parseVideo($url) {
  $re = "/(http:|https:|)\\/\\/(player.|www.)?(vimeo\\.com|youtu(be\\.com|\\.be|be\\.googleapis\\.com))\\/(video\\/|embed\\/|watch\\?v=|v\\/)?([A-Za-z0-9._%-]*)(\\&\\S+)?/"; 
preg_match($re, $url, $matches);

if(strrpos($matches[3],'youtu')>-1){
    $type='youtube';
    $src='https://www.youtube.com/embed/'.$matches[6];
}else if(strrpos($matches[3],'vimeo')>-1){
    $type="vimeo";
    $src='https://player.vimeo.com/video/'.$matches[6];
}else{
    return false;
}


return array(
         'type' =>  $type // return youtube or vimeo
        ,'id'   =>  $matches[6] // return the video id
        ,'src'  =>  $src // return the src for iframe embed
        );
} 


I had a task to enable adding a dropbox videos. So the same input should take href, check it and transform to the playable link which I can then insert in .

const getPlayableUrl = (url) => {
    // Check youtube and vimeo
    let firstCheck = url.match(/(http:|https:|)\/\/(player.|www.)?(vimeo\.com|youtu(be\.com|\.be|be\.googleapis\.com))\/(video\/|embed\/|watch\?v=|v\/)?([A-Za-z0-9._%-]*)(\&\S+)?/);

    if (firstCheck) {
        if (RegExp.$3.indexOf('youtu') > -1) {
            return "//www.youtube.com/embed/" + RegExp.$6;
        } else if (RegExp.$3.indexOf('vimeo') > -1) {
            return 'https://player.vimeo.com/video/' + RegExp.$6
        }
    } else {
        // Check dropbox
        let candidate = ''
        if (url.indexOf('.mp4') !== -1) {
            candidate = url.slice(0, url.indexOf('.mp4') + 4)
        } else if (url.indexOf('.m4v') !== -1) {
            candidate = url.slice(0, url.indexOf('.m4v') + 4)
        } else if (url.indexOf('.webm') !== -1) {
            candidate = url.slice(0, url.indexOf('.webm') + 5)
        }

        let secondCheck = candidate.match(/(http:|https:|)\/\/(player.|www.)?(dropbox\.com)\/(s\/|embed\/|watch\?v=|v\/)?([A-Za-z0-9._%-]*\/)?(.*)/);
        if (secondCheck) {
            return 'https://dropbox.com/' + RegExp.$4 + RegExp.$5 + RegExp.$6 + '?raw=1'
        } else {
            throw Error("Not supported video resource.");
        }
    }
}


I based myself the previous answers but I needed more out the regex.

Maybe it worked in 2011 but in 2019 the syntax has changed a bit. So this is a refresh.

The regex will allow us to detect weather the url is Youtube or Vimeo. I've added Capture group to easily retrieve the videoID.

If ran with Case insensitive setting please remove the (?i).

(?:(?i)(?:https:|http:)?\/\/)?(?:(?i)(?:www\.youtube\.com\/(?:embed\/|watch\?v=)|youtu\.be\/|youtube\.googleapis\.com\/v\/)(?<YoutubeID>[a-z0-9-_]{11,12})|(?:vimeo\.com\/|player\.vimeo\.com\/video\/)(?<VimeoID>[0-9]+))

https://regex101.com/r/PVdjg0/2


Use this Regex devs:This works like Makhan(react js,Javascript)

^(http\:\/\/|https\:\/\/)?((www\.)?(vimeo\.com\/)([0-9]+)$)|((www\.youtube\.com|youtu\.be)\/.+$)
0

精彩评论

暂无评论...
验证码 换一张
取 消