I have a String with multiline content and want to select a multiline region, preferably using a regular expression (just because I'm trying to understand Java RegEx at the moment).
Consider the input like:
Line 1
abc START def
Line 2
Line 3开发者_如何学运维
gh END jklm
Line 4
Assuming START
and END
are unique and the start/end markers for the region, I'd like to create a pattern/matcher to get the result:
def
Line 2
Line 3
gh
My current attempt is
Pattern p = Pattern.compile("START(.*)END");
Matcher m = p.matcher(input);
if (m.find())
System.out.println(m.group(1));
But the result is
gh
So m.start()
seems to point at the beginning of the line that contains the 'end marker'. I tried to add Pattern.MULTILINE
to the compile call but that (alone) didn't change anything.
Where is my mistake?
You want Pattern.DOTALL
, so .
matches newline characters. MULTILINE
addresses a different issue, the ^
and $
anchors.
Pattern p = Pattern.compile("START(.*)END", Pattern.DOTALL);
You want to set Pattern.DOTALL (so you can match end of line characters with your . wildcard), see this test:
@Test
public void testMultilineRegex() throws Exception {
final String input = "Line 1\nabc START def\nLine 2\nLine 3\ngh END jklm\nLine 4";
final String expected = " def\nLine 2\nLine 3\ngh ";
final Pattern p = Pattern.compile("START(.*)END", Pattern.DOTALL);
final Matcher m = p.matcher(input);
if (m.find()) {
Assert.assertEquals(expected, m.group(1));
} else {
Assert.fail("pattern not found");
}
}
The regex metachar .
does not match a newline. You can try the regex:
START([\w\W]*)END
which uses [\w\W]
in place of .
.
[\w\W]
is a char class to match a word-char and a non-word-char, so effectively matches everything.
精彩评论