开发者

Parsing a multiline variable-length log file

开发者 https://www.devze.com 2022-12-19 09:22 出处:网络
I want to be able to utilize a \'grep\' or \'pcregrep -M\' like solution that parses a log file that fits the following parameters:

I want to be able to utilize a 'grep' or 'pcregrep -M' like solution that parses a log file that fits the following parameters:

  • Each log entry can be multiple lines in length
  • First line of log entry has the key that I want to search for
  • Each key appears on more then one line

So in the example below I would want to return every line that has KEY1 on it and all the supporting lines below it until the next log message.

Log file:
01 Feb 2010 - 10:39:01.755, DEBUG - KEY1:randomtext
        blah
        blah2 T
        blah3 T
        blah4 F
        blah5 F
        blah6
        blah7
01 Feb 2010 - 10:39:01.757, DEBUG - KEY1:somethngelse
01 Feb 2010 - 10:39:01.758, DEBUG - KEY2:randomtest
this is a test
01 Feb 2010 - 10:39:01.760, DEBUG - KEY1:more logs here
01 Feb 2010 - 10:39:01.762, DEBUG - KEY1:eve more here
this is another multiline log entry
keeps on going
but not as long as before
01 Feb 2010 - 10:39:01.763, DEBUG - KEY2:testing
test test test
end of key2
01 Feb 2010 - 10:39:01.762, DEBUG - KEY1:but key 1 is still going
and going
and going
and going
and going
and going
and going
and going
and going
and going
and going
and going
and going
okay enough
01 Feb 2010 - 10:39:01.762, DEBUG - KEY3:and so on
and on
Desired output of searching for KEY1:
01 Feb 2010 - 10:39:01.755, DEBUG - KEY1:randomtext
        blah
        blah2 T
        blah3 T
        blah4 F
        blah5 F
        blah6
        blah7
01 Feb 2010 - 10:39:01.757, DEBUG - KEY1:somethngelse

01 Feb 2010 - 10:39:01.760, DEBUG - KEY1:more logs here
01 Feb 2010 - 10:39:01.762, DEBUG - KEY1:eve more here
this is another multiline log entry
keeps on going
but not as long as before
01 Feb 2010 - 10:39:01.762, DEBUG - KEY1:but key 1 is still going
and going
and going
and going
and going
and going
and going
and going
and going
and going
and going
and going
and going
okay开发者_运维技巧 enough

I was trying to do something like:

pcregrep -M 'KEY1(.*\n)+' logfile

but definitely doesn't work right.


if you are on *nix, you can use the shell

#!/bin/bash
read -p "Enter key: " key
awk -vkey="$key" '
$0~/DEBUG/ && $0 !~key{f=0}
$0~key{ f=1 }
f{print} ' file

output

$ cat file
01 Feb 2010 - 10:39:01.755, DEBUG - KEY1:randomtext
        blah                                       
        blah2 T                                    
        blah3 T                                    
        blah4 F                                    
        blah5 F                                    
        blah6                                      
        blah7                                      
01 Feb 2010 - 10:39:01.757, DEBUG - KEY1:somethngelse
01 Feb 2010 - 10:39:01.758, DEBUG - KEY2:randomtest  
this is a test                                       
01 Feb 2010 - 10:39:01.760, DEBUG - KEY1:more logs here
01 Feb 2010 - 10:39:01.762, DEBUG - KEY1:eve more here 
this is another multiline log entry                    
keeps on going                                         
but not as long as before                              
01 Feb 2010 - 10:39:01.763, DEBUG - KEY2:testing       
test test test                                         
end of key2                                            
01 Feb 2010 - 10:39:01.762, DEBUG - KEY1:but key 1 is still going
and going                                                        
and going                                                        
and going                                                        
and going                                                        
and going                                                        
and going                                                        
and going                                                        
and going                                                        
and going                                                        
and going
and going
and going
okay enough
01 Feb 2010 - 10:39:01.762, DEBUG - KEY3:and so on
and on

$ ./shell.sh
Enter key: KEY1
01 Feb 2010 - 10:39:01.755, DEBUG - KEY1:randomtext
        blah
        blah2 T
        blah3 T
        blah4 F
        blah5 F
        blah6
        blah7
01 Feb 2010 - 10:39:01.757, DEBUG - KEY1:somethngelse
01 Feb 2010 - 10:39:01.760, DEBUG - KEY1:more logs here
01 Feb 2010 - 10:39:01.762, DEBUG - KEY1:eve more here
this is another multiline log entry
keeps on going
but not as long as before
01 Feb 2010 - 10:39:01.762, DEBUG - KEY1:but key 1 is still going
and going
and going
and going
and going
and going
and going
and going
and going
and going
and going
and going
and going
okay enough


I had a similar requirement and decided to code a little tool (in .net) that parses log files for me and write the result to standard output.

Maybe you find it useful. Works on Windows and Linux (Mono)

See here: https://github.com/iohn2000/ParLog

A tool to filter log files for log entries that contain a specific (regex) pattern. Works also with multiline log entries. e.g.: show only log entries from a certain workflow instance. Writes the result to standard output. Use '>' to redirect into a file

default startPattern is :

^[0-9]{2} [\w]{3} [0-9]{4} [0-9]{2}:[0-9]{2}:[0-9]{2},[0-9]{3}

this corresponds to date format: e.g.: 04 Feb 2017 15:02:50,778

Parameters are:

f:wildcard      a file name or wildcard for multiple files
p:pattern       the regex pattern to filter the file(s)
s:startPattern  regex pattern to define when a new log entry starts

Example :

ParLog.exe -f=*.log -p=findMe


Adding on to ghostdog74's answer (thank you very much btw, it works great)

Now takes command line input in the form of "./parse file key" and handles loglevels of ERROR as well as DEBUG

#!/bin/bash
awk -vkey="$2" '
$0~/DEBUG|ERROR/ && $0 !~key{f=0}
$0~key{ f=1 }
f{print} ' $1
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号