I need to match ANY strings that start with:
'/Engine
and end with:
ir_vrn'
I have used this:
vrn_page = re.compile('\'/Engine[a-zA-Z0-9._+-&/?:=]+ir_vrn\'')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.6/re.py", line 190, in compile
return _compile(pattern, flags)
File "/usr/lib/python2.6/re.py", line 245, in _compile
raise error, v # invalid expression
sre_constants.error: bad character range
but doesn't work with this string:
'/Engine/page/im/pop_mostra.php?P_=9078&P_Utentevisitato开发者_如何学Gore=1702795&loto=http://s1.example.com/utloto/9/9078/Media/7df4164ecb81a5992280a1ce81120d05-3a5fa4377a23242690a273a82ea5d607&type=ir_vrn'
Try:
/Engine.*?ir_vrn
Note the question mark. This makes sure that in
/Engined&^&^&^&ir_vrn@$@#$@#ir_vrn!@#!@#
it only catches
/Engined&^&^&^&ir_vrn
rather than
/Engined&^&^&^&ir_vrn@$@#$@#ir_vrn
It doesn't work because you're too restrictive on the middle part. Try this (the .
stands for "any character" in regex):
\'/Engine.+?ir_vrn\'
Also, you may want to anchor the regex if it should only match strings that are not only containing this pattern, but which are exactly as specified. The anchored regex would be like this:
^\'/Engine.+ir_vrn\'$
>>> import re
>>> regexp = "'/Engine.*ir_vrn'"
>>> re.match(regexp, "'/Engineir_vrn'")
<_sre.SRE_Match object at 0x101e2f9f0>
>>> re.match(regexp, "'/Engine/page/im/pop_mostra.php?P_=9078&P_Utentevisitatore=1702795&loto=http://s1.example.com/utloto/9/9078/Media/7df4164ecb81a5992280a1ce81120d05-3a5fa4377a23242690a273a82ea5d607&type=ir_vrn'")
<_sre.SRE_Match object at 0x101e2f988>
>>>
Why not ^\'/Engine.*ir_vrn\'$
?
('\'/Engine[a-zA-Z0-9._+-&/?:=]+ir_vrn\'')
has a problem because ?:
, +
, -
and .
have specific meanings in python regular expressions. You escaped the /
, but not these other characters which fail.
Also, you are misusing character ranges:
[A-Za-z0-9]+
will match one or more alphanumeric characters. [a-zA-Z0-9.]
is syntactically incorrect. [a-zA-Z0-9\.]
is valid. Since you want printable characters \S
will work well.
vrn_page = re.compile(r'\/Engine\S+ir_vrn')
精彩评论