I have one huge string with the form:
MarkerBeg 1
... ...
MarkerEnd 1
MarkerBeg 2
...
MarkerEnd 2
I ha开发者_如何学Cve this information in a string and want to extract the String between each markers(...), is there any way to do this using regex or simple strings methods looking for each marker.
Regards,
[Edited because question became clearer]
Here's the problem as I understand it:
You have a long string with some blocks of text in it delimited by "MarkerBeg [identifier]" and "MarkerEnd [identifier]". Not all the text in the string is inside one of these blocks, and the blocks cannot be nested. The identifiers can be some arbitrary string (here I'm assuming that they only contain characters in the \w
class: letters, numbers, and underscores). You need to extract both the identifiers and the strings inside the blocks.
Here's some code that will do what you want:
import java.util.regex.*;
public class Hello {
public static void main(String[] args) {
String s = "MarkerBeg 1\n some text\nMarkerEnd 1\nxxx\nMarkerBeg 2\nhi there :) \nMarkerEnd 2\nxyz\nMarkerBeg hello\nzgfds\nMarkerEnd hello";
System.out.println("source string:\n" + s);
Pattern p = Pattern.compile("MarkerBeg\\s+(\\w+)\\s+(.*)\\s+MarkerEnd\\s+\\1");
Matcher m = p.matcher(s);
System.out.println("\nextracted:");
while (m.find()) {
String ident = m.group(1);
String string = m.group(2);
System.out.println(ident + ": " + string);
}
}
}
This prints out:
source string: MarkerBeg 1 some text MarkerEnd 1 xxx MarkerBeg 2 hi there :) MarkerEnd 2 xyz MarkerBeg hello zgfds MarkerEnd hello extracted: 1: some text 2: hi there :) hello: zgfds
The regex works as follows:
regex: "MarkerBeg\s+(\w+)\s+(.*)\s+MarkerEnd\s+\1"
(backslashes are escaped in original code)
MarkerBeg
: taken literally
\s+
: one or more whitespace characters
(w+)
: one or more letter, number, or underscore characters, placed in a capturing group to extract later (your identifier)
\s+
: as above
(.*)
: zero or more characters, placed in a capturing group (the contents of your text block)
\s+
: as above
MarkerEnd
: taken literally
\s+
: as above
\1
: the contents of the first capturing group (i.e. your identifier)
Then in the loop, I use m.group(1)
and m.group(2)
to get the contents of the first and second capturing groups.
精彩评论