开发者

Regular expression for a string like this

开发者 https://www.devze.com 2023-01-28 02:26 出处:网络
I need to match ANY strings that start with: \'/Engine and end with: ir_vrn\' I have used this: vrn_page = re.compile(\'\\\'/Engine[a-zA-Z0-9._+-&/?:=]+ir_vrn\\\'\')

I need to match ANY strings that start with:

'/Engine

and end with:

ir_vrn'

I have used this:

 vrn_page = re.compile('\'/Engine[a-zA-Z0-9._+-&/?:=]+ir_vrn\'')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.6/re.py", line 190, in compile
    return _compile(pattern, flags)
  File "/usr/lib/python2.6/re.py", line 245, in _compile
    raise error, v # invalid expression
sre_constants.error: bad character range

but doesn't work with this string:

'/Engine/page/im/pop_mostra.php?P_=9078&P_Utentevisitato开发者_如何学Gore=1702795&loto=http://s1.example.com/utloto/9/9078/Media/7df4164ecb81a5992280a1ce81120d05-3a5fa4377a23242690a273a82ea5d607&type=ir_vrn'


Try:

/Engine.*?ir_vrn

Note the question mark. This makes sure that in

/Engined&^&^&^&ir_vrn@$@#$@#ir_vrn!@#!@#

it only catches

/Engined&^&^&^&ir_vrn

rather than

/Engined&^&^&^&ir_vrn@$@#$@#ir_vrn


It doesn't work because you're too restrictive on the middle part. Try this (the . stands for "any character" in regex):

\'/Engine.+?ir_vrn\'

Also, you may want to anchor the regex if it should only match strings that are not only containing this pattern, but which are exactly as specified. The anchored regex would be like this:

^\'/Engine.+ir_vrn\'$


>>> import re
>>> regexp = "'/Engine.*ir_vrn'"
>>> re.match(regexp, "'/Engineir_vrn'")
<_sre.SRE_Match object at 0x101e2f9f0>
>>> re.match(regexp, "'/Engine/page/im/pop_mostra.php?P_=9078&P_Utentevisitatore=1702795&loto=http://s1.example.com/utloto/9/9078/Media/7df4164ecb81a5992280a1ce81120d05-3a5fa4377a23242690a273a82ea5d607&type=ir_vrn'")
<_sre.SRE_Match object at 0x101e2f988>
>>> 


Why not ^\'/Engine.*ir_vrn\'$?


('\'/Engine[a-zA-Z0-9._+-&/?:=]+ir_vrn\'') has a problem because ?:, +, - and . have specific meanings in python regular expressions. You escaped the /, but not these other characters which fail.

Also, you are misusing character ranges:

[A-Za-z0-9]+ will match one or more alphanumeric characters. [a-zA-Z0-9.] is syntactically incorrect. [a-zA-Z0-9\.] is valid. Since you want printable characters \S will work well.

vrn_page = re.compile(r'\/Engine\S+ir_vrn')

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号