开发者

regex throws StackOverFlow Error

开发者 https://www.devze.com 2023-01-16 14:52 出处:网络
I have a simple regexp question.I have the following mu开发者_开发技巧ltiline string: description: line1\\r\\nline2\\r\\n...

I have a simple regexp question. I have the following mu开发者_开发技巧ltiline string:

description: line1\r\nline2\r\n...

And I am trying to find all the lines that come after the description:. I used the following regexp (and few more):

description: ((.*\r\n){1,})

...without any success. Then I found that there is a 'Regexp StackOverflow' bug (stated as won't fix) in Sun, see Bug #5050507. Can anyone please provide me with the magic formula to overcome this annoying bug? Please note that the total length of the lines must exceed 818 bytes!!


Since you are matching anything beyond the text description, you can simply allow the dot to match newlines with Pattern.DOTALL:

description:\s(.*)

So, in Java:

Pattern regex = Pattern.compile("description:\\s(.*)", Pattern.DOTALL);
Matcher regexMatcher = regex.matcher(subjectString);
if (regexMatcher.find()) {
    ResultString = regexMatcher.group(1);
}

The only semantic difference to your regex (apart from the facts that it won't blow your stack) is that it would also match if whatever follows after description: does not contain a newline. Also, your regex will not match the last line of the file unless it ends in a newline, mine will. Which behaviour is preferable is your decision.

Of course, your functionality could be emulated like this:

description:\s(.*\r\n)

but I doubt that that's really what you want. Or is it?


I can reproduce the error:

StringBuilder sb = new StringBuilder();
for (int i = 0; i < 1000; ++i)
{
    sb.append("j\r\n");
}
String s = "description: " + sb.toString(); 
Pattern pattern = Pattern.compile("description: ((.*\r\n){1,})");
//Pattern pattern = Pattern.compile("description: ((?:.*\r\n)++)");

Matcher matcher = pattern.matcher(s);
boolean b = matcher.find();
if (b) {
    System.out.println(matcher.group(1));
}

The quantifier {1,}is the same as + so you should use + instead, but this still fails. To fix it you can (as Bat K. points out) change the + to ++ making it possessive, which disables backtracking, preventing the stack overflow.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号