开发者

Regular expression for youtube links

开发者 https://www.devze.com 2023-01-16 01:09 出处:网络
Does someone have a regular expression that gets a link to a Youtube video (no开发者_如何学Got embedded object) from (almost) all the possible ways of linking to Youtube?

Does someone have a regular expression that gets a link to a Youtube video (no开发者_如何学Got embedded object) from (almost) all the possible ways of linking to Youtube?

I think this is a pretty common problem and I'm sure there are a lot of ways to link that.

A starting point would be:

  • http://www.youtube.com/watch?v=iwGFalTRHDA
  • http://www.youtube.com/watch?v=iwGFalTRHDA&feature=related
  • http://youtu.be/iwGFalTRHDA
  • http://youtu.be/n17B_uFF4cA
  • http://www.youtube.com/embed/watch?feature=player_embedded&v=r5nB9u4jjy4
  • http://www.youtube.com/watch?v=t-ZRX8984sc
  • http://youtu.be/t-ZRX8984sc
  • ... please add more possible links and/or regular expressions to detect them.


So far I got this Regular expression working for the examples I posted, and it gets the ID on the first group:

http(?:s?):\/\/(?:www\.)?youtu(?:be\.com\/watch\?v=|\.be\/)([\w\-\_]*)(&(amp;)?‌​[\w\?‌​=]*)?


You can use this expression below.

(?:https?:\/\/)?(?:www\.)?youtu\.?be(?:\.com)?\/?.*(?:watch|embed)?(?:.*v=|v\/|\/)([\w\-_]+)\&?

I'm using it, and it cover the most used URLs. I'll keep updating it on This Gist. You can test it on this tool.


I like @brunodles's solution the most but you can still match non video links like https://www.youtube.com/feed/subscriptions

I went with this solution

(?:https?:\/\/)?(?:www\.)?youtu(?:\.be\/|be.com\/\S*(?:watch|embed)(?:(?:(?=\/[-a-zA-Z0-9_]{11,}(?!\S))\/)|(?:\S*v=|v\/)))([-a-zA-Z0-9_]{11,})

It can also be used to match multiple whitespace separated links. The video id will be captured in the first group.

Tested with the following urls:

youtu.be/iwGFalTRHDA
youtube.com/watch?v=iwGFalTRHDA
www.youtube.com/watch?v=iwGFalTRHDA
http://www.youtube.com/watch?v=iwGFalTRHDA
https://www.youtube.com/watch?v=iwGFalTRHDA
https://www.youtube.com/watch?v=MoBL33GT9S8&feature=share
https://www.youtube.com/embed/watch?feature=player_embedded&v=iwGFalTRHDA
https://www.youtube.com/embed/watch?v=iwGFalTRHDA
https://www.youtube.com/embed/v=iwGFalTRHDA
https://www.youtube.com/watch/iwGFalTRHDA
http://www.youtube.com/attribution_link?u=/watch?v=aGmiw_rrNxk&feature=share
https://m.youtube.com/watch?v=iwGFalTRHDA

// will not match
https://www.youtube.com/feed/subscriptions
https://www.youtube.com/channel/UCgc00bfF_PvO_2AvqJZHXFg
https://www.youtube.com/c/NatGeoEdOrg/videos

https://regex101.com/r/rq2KLv/1


I improved the links posted above with a friend for a script I wrote for IRC to recognize even links without http at all. It worked on all stress tests I got so far, including garbled text with barely recognizable youtube urls, so here it is:

~(?:https?://)?(?:www\.)?youtu(?:be\.com/watch\?(?:.*?&(?:amp;)?)?v=|\.be/)([\w\-]+)(?:&(?:amp;)?[\w\?=]*)?~


I testet all the regular expressions that are shown here and none could cover all url types that my client was using.

I built this pretty much through trial and error, but it seems to work with all the patterns that Poppy Deejay posted.

"(?:.+?)?(?:\/v\/|watch\/|\?v=|\&v=|youtu\.be\/|\/v=|^youtu\.be\/)([a-zA-Z0-9_-]{11})+"

Maybe it helps someone who is in a similar situation that I had today ;)


Piggy backing on Fanmade, this covers the below links including the url encoded version of attribution_links:

(?:.+?)?(?:\/v\/|watch\/|\?v=|\&v=|youtu\.be\/|\/v=|^youtu\.be\/|watch\%3Fv\%3D)([a-zA-Z0-9_-]{11})+



https://www.youtube.com/attribution_link?a=tolCzpA7CrY&u=%2Fwatch%3Fv%3DMoBL33GT9S8%26feature%3Dshare
https://www.youtube.com/watch?v=MoBL33GT9S8&feature=share
http://www.youtube.com/watch?v=iwGFalTRHDA 
https://www.youtube.com/watch?v=iwGFalTRHDA 
http://www.youtube.com/watch?v=iwGFalTRHDA&feature=related 
http://youtu.be/iwGFalTRHDA 
http://www.youtube.com/embed/watch?feature=player_embedded&v=iwGFalTRHDA
http://www.youtube.com/embed/watch?v=iwGFalTRHDA
http://www.youtube.com/embed/v=iwGFalTRHDA
http://www.youtube.com/watch?feature=player_embedded&v=iwGFalTRHDA
http://www.youtube.com/watch?v=iwGFalTRHDA
www.youtube.com/watch?v=iwGFalTRHDA 
www.youtu.be/iwGFalTRHDA 
youtu.be/iwGFalTRHDA 
youtube.com/watch?v=iwGFalTRHDA 
http://www.youtube.com/watch/iwGFalTRHDA
http://www.youtube.com/v/iwGFalTRHDA
http://www.youtube.com/v/i_GFalTRHDA
http://www.youtube.com/watch?v=i-GFalTRHDA&feature=related 
http://www.youtube.com/attribution_link?u=/watch?v=aGmiw_rrNxk&feature=share&a=9QlmP1yvjcllp0h3l0NwuA
http://www.youtube.com/attribution_link?a=fF1CWYwxCQ4&u=/watch?v=qYr8opTPSaQ&feature=em-uploademail
http://www.youtube.com/attribution_link?a=fF1CWYwxCQ4&feature=em-uploademail&u=/watch?v=qYr8opTPSaQ


I've been having problems lately with the atttribution_link urls so i tried making my own regex that works for those too.

Here is my regex string:

(https?://)?(www\\.)?(yotu\\.be/|youtube\\.com/)?((.+/)?(watch(\\?v=|.+&v=))?(v=)?)([\\w_-]{11})(&.+)?

and here are some test cases i've tried:

http://www.youtube.com/watch?v=iwGFalTRHDA 
https://www.youtube.com/watch?v=iwGFalTRHDA 
http://www.youtube.com/watch?v=iwGFalTRHDA&feature=related 
http://youtu.be/iwGFalTRHDA 
http://www.youtube.com/embed/watch?feature=player_embedded&v=iwGFalTRHDA
http://www.youtube.com/embed/watch?v=iwGFalTRHDA
http://www.youtube.com/embed/v=iwGFalTRHDA
http://www.youtube.com/watch?feature=player_embedded&v=iwGFalTRHDA
http://www.youtube.com/watch?v=iwGFalTRHDA
www.youtube.com/watch?v=iwGFalTRHDA 
www.youtu.be/iwGFalTRHDA 
youtu.be/iwGFalTRHDA 
youtube.com/watch?v=iwGFalTRHDA 
http://www.youtube.com/watch/iwGFalTRHDA
http://www.youtube.com/v/iwGFalTRHDA
http://www.youtube.com/v/i_GFalTRHDA
http://www.youtube.com/watch?v=i-GFalTRHDA&feature=related 
http://www.youtube.com/attribution_link?u=/watch?v=aGmiw_rrNxk&feature=share&a=9QlmP1yvjcllp0h3l0NwuA
http://www.youtube.com/attribution_link?a=fF1CWYwxCQ4&u=/watch?v=qYr8opTPSaQ&feature=em-uploademail
http://www.youtube.com/attribution_link?a=fF1CWYwxCQ4&feature=em-uploademail&u=/watch?v=qYr8opTPSaQ

Also remember to check the string you get for your video url, sometimes it may get the percent characters. If so just do this

url = [url stringByReplacingPercentEscapesUsingEncoding:NSUTF8StringEncoding];

and it should fix it.

Remember also that the index of the youtube key is now index 9.

NSRange youtubeKey = [result rangeAtIndex:9]; //the youtube key
NSString * strKey = [url substringWithRange:youtubeKey] ;


It'd be the longest RegEx in the world if you managed to cover all link formats, but here's one to get you started which will cover the first couple of link formats:

http://(www\.)?youtube\.com/watch\?.*v=([a-zA-Z0-9]+).*

The second group will match the video ID if you need to get that out.


(?:http?s?:\/\/)?(?:www.)?(?:m.)?(?:music.)?youtu(?:\.?be)(?:\.com)?(?:(?:\w*.?:\/\/)?\w*.?\w*-?.?\w*\/(?:embed|e|v|watch|.*\/)?\??(?:feature=\w*\.?\w*)?&?(?:v=)?\/?)([\w\d_-]{11})(?:\S+)?

https://regex101.com/r/nJzgG0/3

Detects YouTube and YouTube Music link in any string


I took all variants from here:

https://gist.github.com/rodrigoborgesdeoliveira/987683cfbfcc8d800192da1e73adc486#file-youtubeurlformats-txt

And built this regexp (YouTube ID is in group 2):

(\/|%3D|v=|vi=)([0-9A-z-_]{11})[%#?&\s]

Check it here: https://regexr.com/4u4ud

Edit: Works for any single string w/o breaks.


I'm working with that kind of links:

http://www.youtube.com/v/M-faNJWc9T0?fs=1&rel=0

And here's the regEx I'm using to get ID from it:

"(.+?)(\/v/)([a-zA-Z0-9_-]{11})+"


This is iterating on the existing answers and handles edge cases better. (for example http://thisisnotyoutu.be/thing)

/(?:https?:\/\/|www\.|m\.|^)youtu(?:be\.com\/watch\?(?:.*?&(?:amp;)?)?v=|\.be\/)([\w‌​\-]+)(?:&(?:amp;)?[\w\?=]*)?/


here is the complete solution for getting youtube video id for java or android, i didn't found any link which doesn't work with this function

public static String getValidYoutubeVideoId(String youtubeUrl)
{
    if(youtubeUrl == null || youtubeUrl.trim().contentEquals(""))
    {
        return "";
    }
    youtubeUrl = youtubeUrl.trim();
    String validYoutubeVideoId = "";
    String regexPattern = "^(?:https?:\\/\\/)?(?:[0-9A-Z-]+\\.)?(?:youtu\\.be\\/|youtube\\.com\\S*[^\\w\\-\\s])([\\w\\-]{11})(?=[^\\w\\-]|$)(?![?=&+%\\w]*(?:['\"][^<>]*>|<\\/a>))[?=&+%\\w]*";
    Pattern regexCompiled = Pattern.compile(regexPattern, Pattern.CASE_INSENSITIVE);
    Matcher regexMatcher = regexCompiled.matcher(youtubeUrl);
    if(regexMatcher.find())
    {
        try
        {
            validYoutubeVideoId = regexMatcher.group(1);
        }
        catch(Exception ex)
        {
        }
    }
    return validYoutubeVideoId;
}


This is my answer to use in Scala. This is useful to extract 11 digits from Youtube's URL.

"https?://(?:[0-9a-zA-Z-]+.)?(?:www.youtube.com/|youtu.be\S*[^\w-\s])([\w -]{11})(?=[^\w-]|$)(?![?=&+%\w](?:[\'"][^<>]>|))[?=&+%\w-]*"

def getVideoLinkWR: UserDefinedFunction = udf(f = (videoLink: String) => {
    val youtubeRgx = """https?://(?:[0-9a-zA-Z-]+\.)?(?:youtu\.be/|youtube\.com\S*[^\w\-\s])([\w \-]{11})(?=[^\w\-]|$)(?![?=&+%\w]*(?:[\'"][^<>]*>|</a>))[?=&+%\w-./]*""".r
    videoLink match {
        case youtubeRgx(a) => s"$a".toString
        case _ => videoLink.toString
    }
}


Youtube video URL Change to iframe supported link:

REGEX: https://regex101.com/r/LeZ9WH/2/

http://www.youtube.com/watch?v=iwGFalTRHDA
http://www.youtube.com/watch?v=iwGFalTRHDA&feature=related
http://youtu.be/iwGFalTRHDA
http://youtu.be/n17B_uFF4cA
http://www.youtube.com/embed/watch?feature=player_embedded&v=r5nB9u4jjy4
http://www.youtube.com/watch?v=t-ZRX8984sc
http://youtu.be/t-ZRX8984sc
https://youtu.be/2sFlFPmUfNo?t=1

Php function example:

if (!function_exists('clean_youtube_link')) {

        /**
         * @param $link
         * @return string|string[]|null
         */
        function clean_youtube_link($link)
        {
            return preg_replace(
                '#(.+?)(\/)(watch\x3Fv=)?(embed\/watch\x3Ffeature\=player_embedded\x26v=)?([a-zA-Z0-9_-]{11})+#',
                "https://www.youtube.com/embed/$5",
                $link
            );
        }
}


This should work for almost all youtube links when extracting from a string:

((?:https?:)?\/\/)?((?:www|m)\.)?((?:youtube\.com|youtu.be))(\/(?:[\w\-]+\?v=|embed\/|v\/)?)([\w\-]{10}).\b


    var isValidYoutubeLink: Bool{
        // working for all the youtube url's
        NSPredicate(format: "SELF MATCHES %@", "(?:http?s?:\\/\\/)?(?:www.)?(?:m.)?(?:music.)?youtu(?:\\.?be)(?:\\.com)?(?:(?:\\w*.?:\\/\\/)?\\w*.?\\w*-?.?\\w*\\/(?:embed|e|v|watch|.*\\/)?\\??(?:feature=\\w*\\.?\\w*)?&?(?:v=)?\\/?)([\\w\\d_-]{11})(?:\\S+)?").evaluate(with: self)
    }


With this Javascript Regex, the first capture is a video ID :

^(?:https?:)?(?:\/\/)?(?:www\.)?(?:youtu\.be\/|youtube(?:\-nocookie)?\.(?:[A-Za-z]{2,4}|[A-Za-z]{2,3}\.[A-Za-z]{2})\/)(?:watch|embed\/|vi?\/)*(?:\?[\w=&]*vi?=)?([^#&\?\/]{11}).*$


(?-s)^https?\W+(?:www\.|m\.|music\.)*youtu\.?be(?:\.com|\/watch|\/o?embed|\/shorts|\/attribution_link\?[&\w\-=]*[au]=|\/ytsc\w+|[\?&\/]+[ve]i?\b|\?feature=\w+|-nocookie)*[\/=]([a-z\d\-_]{11})[\?&#% \t ] *.*$

or

(?-s)^(?:(?!https?[:\/]|www\.|m\.yo|music\.yo|youtu\.?be[\/\.]|watch[\/\?]|embed\/)\V)*(?:https?[:\/]+|www\.|m\.|music\.)+youtu\.?be(?:\.com\/|watch|o?embed(?:\/|\?url=\S+?)?|shorts|attribution_link\?[&\w\-=]*[au]=\/?|ytsc\w+|[\?&]*[ve]i?\b|\?feature=\w+|[\?&]time_continue=\d+|-nocookie|%[23][56FD])*(?:[\/=]|%2F|%3D)([a-z\d\-_]{11})[\?&#% \t ]? *.*$

(the part >>#% \t⠀ ]<< should contain continuous space, which is Alt+255, but stackoverflow-com can't print it) (this string may be replaced to \1, sorted and abbreviated with: )

V█(?-i)^([A-Za-z\d\-_]{11})(?:\v+\1)*$
>█https:\/\/youtu\.be\/\1

(./dot can take up any symbol; \V or [^\r\n] can any except special, emoji and others; this >> [^!-⠀:/‽|\s] << can grab some emoji)

https://youtu.be/x26ANNC3C-8 • ♾ 
0

精彩评论

暂无评论...
验证码 换一张
取 消