开发者

How can I fix this regex which extracts a tweet id from a Twitter URL?

开发者 https://www.devze.com 2023-02-13 19:06 出处:网络
I am trying to write a regex that will extract a tweet id from a Twi开发者_开发知识库tter URL.

I am trying to write a regex that will extract a tweet id from a Twi开发者_开发知识库tter URL.

I have this one, which works when the Twitter username has a number in it:

'.*?\\d+.*?(\\d+)'

ruby-1.9.2-p0 > Regexp.new('.*?\\d+.*?(\\d+)',Regexp::IGNORECASE).match('https://twitter.com/#!/sportsguy33/status/41257488166686720')[1]
 => "41257488166686720" 
ruby-1.9.2-p0 > Regexp.new('.*?\\d+.*?(\\d+)',Regexp::IGNORECASE).match('http://twitter.com/#!/dailythunder/status/41382006113841153')[1]
 => "3" 

And this one, which works when the Twitter username doesn't have a number in it

'.*?(\\d+)'

ruby-1.9.2-p0 > Regexp.new('.*?(\\d+)',Regexp::IGNORECASE).match('https://twitter.com/#!/sportsguy33/status/41257488166686720')[1]
 => "33" 
ruby-1.9.2-p0 > Regexp.new('.*?(\\d+)',Regexp::IGNORECASE).match('http://twitter.com/#!/dailythunder/status/41382006113841153')[1]
 => "41382006113841153" 

How can I write one that will work in either case?


if the tweet ID is the last part of the url, you can use:

'\/(\d+)$'

the $ means the end of the string


I just released a gem tweet_url to parse Twitter URL.

require 'tweet_url'
tweet_url = TweetUrl.parse('https://twitter.com/yukihiro_matz/status/755950562227605504')
tweet_url.status_id  #=> 755950562227605504

Heads up! Be aware of that possibly there's a URL like https://twitter.com/sferik/status/540897316908331009/photo/1, so we cannot simply extract the last numeric part.


I would suggest you try out Rubular.

Rubular is a Ruby-based regular expression editor. It's a handy way to test regular expressions as you write them.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号