How do I use Regex to find the ID in a YouTube link?_问答_开发者

How do I use Regex to find the ID in a YouTube link?

开发者 https://www.devze.com 2022-12-26 16:15 出处：网络

when I try to extract this video ID (AIiMa2Fe-ZQ) with a regex expression, I开发者_开发技巧 can\'t get the dash an all the letters after.

when I try to extract this video ID (AIiMa2Fe-ZQ) with a regex expression, I开发者_开发技巧 can't get the dash an all the letters after.

>>> id = re.search('(?<=\?v\=)\w+', 'http://www.youtube.com/watch?v=AIiMa2Fe-ZQ')
>>> print id.group(0)
>>> AIiMa2Fe

Intead of \w+ use below. Word character (\w) doesn't include a dash. It only includes [a-zA-Z_0-9].

[\w-]+

I don't know the pattern for youtube hashes, but just include the "-" in the possibilities as it is not considered an alpha:

import re
id = re.search('(?<=\?v\=)[\w-]+', 'http://www.youtube.com/watch?v=AIiMa2Fe-ZQ')
print id.group(0)

I have edited the above because as it turns out:

>>> re.search("[\w|-]", "|").group(0)
'|'

The "|" in the character definition does not act as a special character but does indeed match the "|" pipe. My apologies.

>>> re.search('(?<=v=)[\w-]+', 'http://www.youtube.com/watch?v=AIiMa2Fe-ZQ').group()
'AIiMa2Fe-ZQ'

\w is a short-hand for [a-zA-Z0-9_] in python2.x, you'll have to use re.A flag in py3k. You quite clearly have additional character in that videoid, i.e., hyphen. I've also removed redundant escape backslashes from the lookbehind.

Use the urlparse module instead of regex for such kind of things.

import urlparse

parsed_url = urlparse.urlparse(url)
if parsed_url.netloc.find('youtube.com') != -1 and parsed_url.path == '/watch':
    video = urlparse.parse_qs(parsed_url.query).get('v', None)

    if video is None:
        video = urlparse.parse_qs(parsed_url.fragment.strip('!')).get('v', None)

    if video is not None:
        print video[0]

EDIT: Updated for the upcoming new youtube url format.

/(?:/v/|/watch\?v=|/watch#!v=)([A-Za-z0-9_-]+)/

Explain the RE

There are three alternate YouTube formats: /v/[ID] and watch?v= and the new AJAX watch#!v= This RE captures all three. There is also new YouTube URL for user pages that is of the form /user/[user]?content={complex URI} This is not captured here by any regex...

I'd try this:

>>> import re
>>> a = re.compile(r'.*(\-\w+)$')
>>> a.search('http://www.youtube.com/watch?v=AIiMa2Fe-ZQ').group(1)
'-ZQ'

How do I use Regex to find the ID in a YouTube link?

精彩评论

关注公众号

热门标签

图文推荐

How do I use Regex to find the ID in a YouTube link?

更多 问答 相关资讯：

精彩评论

关注公众号

热门标签

图文推荐

更多问答相关资讯：