开发者

python regex - last occurance before EOL

开发者 https://www.devze.com 2023-04-06 06:59 出处:网络
I have a set of strings from a log file that I need to parse: timestamp - user not found : user1 timestamp - exception in xyz.security.plugin: global error : low memory

I have a set of strings from a log file that I need to parse:

    timestamp - user not found : user1
    timestamp - exception in xyz.security.plugin: global error : low memory

I want to capture the text between "-" and the last ":".

Currently I am using r' -(.*?)\n' which captures the string till the EOL. Please bear in mind that there may be more than 2 colons used in the string. I need to capture till the very last colon used before EOL. Also, if there are no ":" colons in the string, it should take EOL as the ending sequence.

thanks.

EDIT: better examples;

    2011-07-29 07:29:44,112 [TP-Processor10] ERROR springsecurity.GrailsDaoImpl  - User not found: sspm
    2011-07-29 09:01:05,850 [TP-Processor3] ERROR transaction.JDBCTransaction  - JDBC commit failed
    开发者_运维百科2011-07-29 08:32:00,353 [TP-Processor1] ERROR errors.GrailsExceptionResolver  - Exception occurred when processing request: [POST] /webapp/user/index - parameters: runtime exception


import re

for line in open('logfile.log'):
    match = re.search(r'-(.*):', line)
    if match:
        print match.group(1)
    else:
        match = re.search(r'-(.*)', line)
        if match:
            print match.group(1)
        else:
            print 'No match in line', line.strip()


Try this:

"(?<=-).*(?=:[^:]*$)"

It matches between a - and the last : in the current line. If there is no colon, it won't match at all, therefore you can do:

r = re.compile("(?<=-).*(?=:[^:]*$)")
result = r.search(mystring) 
if result:
    match = result.group(0)
else:
    match = "\n"

This does what you said ("if there is no colon, match EOL"), if you meant "if there is no colon, match until EOL", then a single regex will do:

r = re.compile("(?<=-)(?:[^:]*$|.*(?=:[^:]*$))")


r'^.+ -(.+):.*$' does the trick for me.

This works because the (.+) is greedy. Check the Python documentation for re here - in particular, for *, +, and ?.

0

精彩评论

暂无评论...
验证码 换一张
取 消