开发者

RegEx - problem with multiline input

开发者 https://www.devze.com 2023-01-15 08:14 出处:网络
I have a String with multiline content and want to select a multiline region, preferably using a regular expression (just because I\'m trying to understand Java RegEx at the moment).

I have a String with multiline content and want to select a multiline region, preferably using a regular expression (just because I'm trying to understand Java RegEx at the moment).

Consider the input like:

Line 1
abc START def
Line 2
Line 3开发者_如何学运维
gh END jklm
Line 4

Assuming START and END are unique and the start/end markers for the region, I'd like to create a pattern/matcher to get the result:

 def
Line 2
Line 3
gh 

My current attempt is

Pattern p = Pattern.compile("START(.*)END");
Matcher m = p.matcher(input);
if (m.find())
  System.out.println(m.group(1));

But the result is

gh

So m.start() seems to point at the beginning of the line that contains the 'end marker'. I tried to add Pattern.MULTILINE to the compile call but that (alone) didn't change anything.

Where is my mistake?


You want Pattern.DOTALL, so . matches newline characters. MULTILINE addresses a different issue, the ^ and $ anchors.

Pattern p = Pattern.compile("START(.*)END", Pattern.DOTALL);


You want to set Pattern.DOTALL (so you can match end of line characters with your . wildcard), see this test:

@Test
public void testMultilineRegex() throws Exception {
    final String input = "Line 1\nabc START def\nLine 2\nLine 3\ngh END jklm\nLine 4";
    final String expected = " def\nLine 2\nLine 3\ngh ";
    final Pattern p = Pattern.compile("START(.*)END", Pattern.DOTALL);
    final Matcher m = p.matcher(input);
    if (m.find()) {
        Assert.assertEquals(expected, m.group(1));
    } else {
        Assert.fail("pattern not found");
    }
}


The regex metachar . does not match a newline. You can try the regex:

START([\w\W]*)END

which uses [\w\W] in place of ..

[\w\W] is a char class to match a word-char and a non-word-char, so effectively matches everything.

0

精彩评论

暂无评论...
验证码 换一张
取 消