开发者

Can't pass value into re.findall (python)

开发者 https://www.devze.com 2023-03-15 06:02 出处:网络
Can anyone help me understand why this works... z = re.findall(r\'(fo开发者_JAVA技巧obar)\', string)

Can anyone help me understand why this works...

z = re.findall(r'(fo开发者_JAVA技巧obar)', string)

But this doesn't?

regexStr = "r'(foobar)'"
z = re.findall(regexStr, string)

I've printed regexStr and determined that it's output is IDENTICAL to r'(foobar)'.

Can someone pls help? I've also tried escaping the apostrophes too.

JD


The "r" modifier on string should be outside of the quotes

regexStr = r'(foobar)'

From the docs - "String literals may optionally be prefixed with a letter 'r' or 'R'; such strings are called raw strings and use different rules for interpreting backslash escape sequences."


A solution to your problem is

regexStr = r'(%s)' % searchString

where searchString will replace %s

In Python it is often better to use this construct than regular concatenation. (meaning str1 + str2 + ... ) Especially as you don't have to care about converting ints doubles and so on.

More on the subject here: 3.5. Formatting Strings

The r should not be part of the string, it only tells the python interpreter what kind of string it is:

r('hello\n')  # Raw string => (hello\n)
u'unicodestring'


The r modifier builds a raw string. It tells Python not to convert backslash escape sequences to special characters, such as \t or \n, for built-in strings. It has nothing to do with regular expression escape sequences.

>>> len('\t')  # tab character only
1
>>> len(r'\t') # backslach character followed by a tee
2

However, regular expression syntax has its own set of escaping rules which often collide with the escape rules of built-in Python strings. The r keyword helps us only deal with one. For example the first string below is a regular expression text that matches word characters, so is the second one because Python converts \\ to \ for built-in strings unless the r keyword is provided.

>>> re.compile(r'\w') == re.compile('\\w')
True

In your case r'(foobar)' is exactly equivalent to '(foobar)' because there is no backslash sequence to escape. This string is a regular expression text only in your mind. The r keyword does not tell Python to interpret the string that way. Python only knows about regular expression objects which you build with re.compile() or which are inherently compiled with methods like re.findall().

Compiling regular expression objects have its own set of rules different from built-in string escape sequence rules and regex rules are not related to the r keyword. The regular expression compiler does not understand its special meaning, only the Python interpreter does.

In your case the r keyword in "r'(foobar)'" has no special treatment, because it is directly passed to the regex compiler. You are effectively building a regular expression that searches a string beginning with ", followed by r and so on. That's why two expressions differ.

>>> re.compile(r'(foobar)') == re.compile('"r(foobar)"') # your expressions
False

The usage of the r keyword has no effect here:

>>> re.compile(r'(foobar)') == re.compile('(foobar)')
True
>>> re.compile(r'"r(foobar)"') == re.compile('"r(foobar)"')
True

For more information:

  • http://docs.python.org/reference/lexical_analysis.html#string-literals
  • http://docs.python.org/library/re.html#regular-expression-syntax
0

精彩评论

暂无评论...
验证码 换一张
取 消