I have this regex pattern
<(\d+)>(\d+\.\d+|\d{4}\-\d+\-\d+\s+\d{2}:\d{2}:\d{2})(?:\..*?)*\s+(ALER|NOTI)
and this is my input (will not matched at all)
<150>2010-12-29 18:11:30.883 -0700 192.168.2.145 80 192.168.2.87 2795 "-" "-" GET HTTP 192.168.2.145 HTTP/1.1 200 36200 0 1038 544 192.168.2.221 80 540 SERVER DEFAULT PASSIVE VALID /joomla/ "-" http://192.168.2.145/joomla/index.php?option=com_content&view=a be4d44e8f3986183a87991398c1c212e=1; be4d44e8f3986183a87991398c1c212e=1
This will return not matched result but it takes too long to output the result. Since i have a thousand of logs/inputs in a second, it should finish very fast for every single log/input. Sometime it reaches CPU 100%.
Can anyone help me to solve this regex pro开发者_如何转开发blem?
Thanks
You have catastrophic backtracking due to the large number of ways the expression (?:\..*?)*
can match. Potentially millions of matches must be checked, increasing exponentially with the number of dots in your string. To fix it you can change this:
(?:\..*?)*\s+
to this:
\..*\s
It looks like you are looking for some date/time/etc. information about the ALER/NOTI lines. Can't you only parse those lines by grepping the ALER/NOTI first? Then it would probably be a lot easier to run the regex on those interesting lines (and it would probably simplify the regex).
Since you didn't provide a working example, the only thing to go on as to why its slow
is this (?:\..*?)*
which is bizzare. Meta period . matches anything including literal
period. That expression says if there is a literal period, get it and all up to the \s.
But, the literal period is optional.
(?:\.(?:(?!\s(?:ALER|NOTI)).)*?)?\s+(ALER|NOTI)
Which itself is rather bizzare. It can be viewed if expanded.
(?:
\.
(?:
(?!\s(?:ALER|NOTI)).
)*?
)?
\s+
(ALER|NOTI)
精彩评论