开发者

Help with an regular expression for date + 3 fixed character

开发者 https://www.devze.com 2023-02-19 23:30 出处:网络
I have a url and I need to get rid of everything other than the code for the current date. For example this is the link I have:

I have a url and I need to get rid of everything other than the code for the current date.

For example this is the link I have:

http://e-ditionsbyfry.com/Olive/ODE/WAC/Default.aspx?href=WAC/2011/03/01&pageno=18

The bit I need is:

WAC/2011/03/01

The 3 letters are always the same however the date will change eg:

WAC/2012/04/02

Can someone please hel开发者_如何学Pythonp me with the regular expression needed to find this sequence?


You can use the regex:

(WAC\/(?:19|20)\d\d\/(?:0[1-9]|1[012])\/(?:0[1-9]|[12][0-9]|3[01]))


This would work in Python (not tested):

r'WAC/[d]{4}/[d][d]/[d][d]'

To be safe, you could also search for the href= in front:

r'href=WAC/[d]{4}/[d][d]/[d][d]'


If you are sure that the part you want will always be the value to the key href, then

/\?.*?href=(WAC[^\&]+)/
0

精彩评论

暂无评论...
验证码 换一张
取 消