I have the following regular expression, which I think should match any character that is not alphanumeric, '!', '?', or '.'
re.compile('[^A-z ?!.]')
However, I get the following weird result in iPython:
In [21]: re.sub(a, ' ', 'Hey !$%^&*.#$%^&.')
Out[21]: 'Hey ! ^ . ^ .'
The result is the same when I escape the '.' in the regular expression.
How do I match the caret so that it is removed from the string as well?
You have an error in your regular expression. Note that the case of the a
and z
is important. A-z
includes all characters between ASCII value 65 (A) and 122 (Z), which includes the caret character (ASCII code 94).
Try this instead:
re.compile('[^A-Za-z ?!.]')
Example:
import re
regex = re.compile('[^A-Za-z ?!.]')
result = regex.sub(' ', 'Hey !$%^&*.#$%^&.')
print result
Result:
Hey ! . .
The caret falls between the upper and lower cases in ASCII. You need [^a-zA-Z ?!\.]
精彩评论