I have a url and I need to get rid of everything other than the code for the current date.
For example this is the link I have:
http://e-ditionsbyfry.com/Olive/ODE/WAC/Default.aspx?href=WAC/2011/03/01&pageno=18
The bit I need is:
WAC/2011/03/01
The 3 letters are always the same however the date will change eg:
WAC/2012/04/02
Can someone please hel开发者_如何学Pythonp me with the regular expression needed to find this sequence?
You can use the regex:
(WAC\/(?:19|20)\d\d\/(?:0[1-9]|1[012])\/(?:0[1-9]|[12][0-9]|3[01]))
This would work in Python (not tested):
r'WAC/[d]{4}/[d][d]/[d][d]'
To be safe, you could also search for the href=
in front:
r'href=WAC/[d]{4}/[d][d]/[d][d]'
If you are sure that the part you want will always be the value to the key href
, then
/\?.*?href=(WAC[^\&]+)/
精彩评论