开发者

Java RegEx with lookahead failing

开发者 https://www.devze.com 2023-02-11 08:24 出处:网络
In Java, I was unable to get a regex to behave the way I wanted, and wrote this little JUnit test to demonstrate the problem:

In Java, I was unable to get a regex to behave the way I wanted, and wrote this little JUnit test to demonstrate the problem:

public void testLookahead() throws Exception {
    Pattern p = Pattern.compile("ABC(?!!)");
    assertTrue(p.matcher("ABC").find());
    assertTrue(p.matcher("ABCx").find());
    assertFalse(p.matcher("ABC!").find());
    assertFalse(p.matcher("ABC!x").find());
    assertFalse(p.matcher("blah/ABC!/blah").f开发者_如何学JAVAind());

    p = Pattern.compile("[A-Z]{3}(?!!)");
    assertTrue(p.matcher("ABC").find());
    assertTrue(p.matcher("ABCx").find());
    assertFalse(p.matcher("ABC!").find());
    assertFalse(p.matcher("ABC!x").find());
    assertFalse(p.matcher("blah/ABC!/blah").find());

    p = Pattern.compile("[A-Z]{3}(?!!)", Pattern.CASE_INSENSITIVE);
    assertTrue(p.matcher("ABC").find());
    assertTrue(p.matcher("ABCx").find());
    assertFalse(p.matcher("ABC!").find());
    assertFalse(p.matcher("ABC!x").find());
    assertFalse(p.matcher("blah/ABC!/blah").find()); //fails, why?

    p = Pattern.compile("[A-Za-z]{3}(?!!)");
    assertTrue(p.matcher("ABC").find());
    assertTrue(p.matcher("ABCx").find());
    assertFalse(p.matcher("ABC!").find());
    assertFalse(p.matcher("ABC!x").find());
    assertFalse(p.matcher("blah/ABC!/blah").find());  //fails, why?
}

Every line passes except for the two marked with the comment. The groupings are identical except for pattern string. Why would adding case-insensitivity break the matcher?


Your tests fail, because in both cases, the pattern [A-Z]{3}(?!!) (with CASE_INSENSITIVE) and [A-Za-z]{3}(?!!) find at least one match in "blah/ABC!/blah" (they find bla twice).

A simple tests shows this:

Pattern p = Pattern.compile("[A-Z]{3}(?!!)", Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher("blah/ABC!/blah");
while(m.find()) {
    System.out.println(m.group());
}

prints:

bla
bla


Those two don't throw false values because there are substrings within the full string that match the pattern. Specifically, the string blah matches the regular expression (three letters not followed by an exclamation mark). The case-sensitive ones correctly fail because blah isn't upper-case.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号