开发者

regex to find postition between two markers in string

开发者 https://www.devze.com 2023-02-11 07:51 出处:网络
i need to find anything between show_detail& and ;session_id=1445045 in https://www.site.gov.uk//search/cgi-bin/contract_search/contract_search.cgi?rm=show_detail&id=4035219;

i need to find anything between

show_detail&

and

;session_id=1445045

in

https://www.site.gov.uk//search/cgi-bin/contract_search/contract_search.cgi?rm=show_detail&id=4035219;num=1;session_id=1445045;start=0;recs=20;subscription=1;value=0

using reg开发者_如何学Pythonex in python.

i know i need to use lookbehind/ahead but i can't seem to make it work!

please help!

thanks :)


Why use a regex?

>>>> url = 'https://ww.site.gov.....'
>>> start = url.index('show_detail&') + len('show_detail&')
>>> end = url.index(';session_id=')
>>> url[start:end]
'id=4035219;num=1'


>>> s= "https://www.site.gov.uk//search/cgi-bin/contract_search/contract_search.cgi?rm=show_detail&id=4035219;num=1;session_id=1445045;start=0;recs=20;subscription=1;value=0"
>>> s.split(";session_id=1445045")[0].split("show_detail&")[-1]
'id=4035219;num=1'
>>>


You can use a non greedy match (.*?) in between your markers.

>>> import re
>>> url = "https://www.site.gov.uk//search/cgi-bin/contract_search/contract_search.cgi?rm=show_detail&id=4035219;num=1;session_id=1445045;start=0;recs=20;subscription=1;value=0"
>>> m = re.search("show_detail&(.*?);session_id=1445045", url)
>>> m.group(1)
'id=4035219;num=1'


regex = re.compile(r"(?<=show_detail&amp;).*?(?=;session_id=1445045)"

should work. See here for more info on lookaround assertions.


import re


url = "https://www.site.gov.uk//search/cgi-bin/contract_search/contract_search.cgi?rm=show_detail&amp;id=4035219;num=1;session_id=1445045;start=0;recs=20;subscription=1;value=0"
pattern = "([^>].+)(show_detail&amp;)([^>].+)(session_id=1445045)([^>].+)"
reg = re.compile(r''''''+pattern+'''''',flags = re.S)
match =reg.search(url)

print match.group(3)

this would work i think

0

精彩评论

暂无评论...
验证码 换一张
取 消