I have a set of strings from a log file that I need to parse:
timestamp - user not found : user1
timestamp - exception in xyz.security.plugin: global error : low memory
I want to capture the text between "-" and the last ":".
Currently I am using r' -(.*?)\n' which captures the string till the EOL. Please bear in mind that there may be more than 2 colons used in the string. I need to capture till the very last colon used before EOL. Also, if there are no ":" colons in the string, it should take EOL as the ending sequence.
thanks.
EDIT: better examples;
2011-07-29 07:29:44,112 [TP-Processor10] ERROR springsecurity.GrailsDaoImpl - User not found: sspm
2011-07-29 09:01:05,850 [TP-Processor3] ERROR transaction.JDBCTransaction - JDBC commit failed
开发者_运维百科2011-07-29 08:32:00,353 [TP-Processor1] ERROR errors.GrailsExceptionResolver - Exception occurred when processing request: [POST] /webapp/user/index - parameters: runtime exception
import re
for line in open('logfile.log'):
match = re.search(r'-(.*):', line)
if match:
print match.group(1)
else:
match = re.search(r'-(.*)', line)
if match:
print match.group(1)
else:
print 'No match in line', line.strip()
Try this:
"(?<=-).*(?=:[^:]*$)"
It matches between a -
and the last :
in the current line. If there is no colon, it won't match at all, therefore you can do:
r = re.compile("(?<=-).*(?=:[^:]*$)")
result = r.search(mystring)
if result:
match = result.group(0)
else:
match = "\n"
This does what you said ("if there is no colon, match EOL"), if you meant "if there is no colon, match until EOL", then a single regex will do:
r = re.compile("(?<=-)(?:[^:]*$|.*(?=:[^:]*$))")
r'^.+ -(.+):.*$'
does the trick for me.
This works because the (.+)
is greedy. Check the Python documentation for re
here - in particular, for *
, +
, and ?
.
精彩评论