Extracting a subsection from a String in java_问答_开发者

Extracting a subsection from a String in java

开发者 https://www.devze.com 2023-02-23 13:32 出处：网络

I have one huge string with the form: MarkerBeg 1 ... ... MarkerEnd 1 MarkerBeg 2 ... MarkerEnd 2 I ha开发者_如何学Cve this information in a string and want to extract the String between each m

相关专题：regex

I have one huge string with the form:

MarkerBeg 1

... ...

MarkerEnd 1

MarkerBeg 2

...

MarkerEnd 2

I ha开发者_如何学Cve this information in a string and want to extract the String between each markers(...), is there any way to do this using regex or simple strings methods looking for each marker.

Regards,

[Edited because question became clearer]

Here's the problem as I understand it: You have a long string with some blocks of text in it delimited by "MarkerBeg [identifier]" and "MarkerEnd [identifier]". Not all the text in the string is inside one of these blocks, and the blocks cannot be nested. The identifiers can be some arbitrary string (here I'm assuming that they only contain characters in the \w class: letters, numbers, and underscores). You need to extract both the identifiers and the strings inside the blocks.

Here's some code that will do what you want:

import java.util.regex.*;

public class Hello {
    public static void main(String[] args) {
        String s = "MarkerBeg 1\n some text\nMarkerEnd 1\nxxx\nMarkerBeg 2\nhi there :) \nMarkerEnd 2\nxyz\nMarkerBeg hello\nzgfds\nMarkerEnd hello";
        System.out.println("source string:\n" + s);
        Pattern p = Pattern.compile("MarkerBeg\\s+(\\w+)\\s+(.*)\\s+MarkerEnd\\s+\\1");
        Matcher m = p.matcher(s);
        System.out.println("\nextracted:");
        while (m.find()) {
            String ident = m.group(1);
            String string = m.group(2);
            System.out.println(ident + ": " + string);
        }
    }
}

This prints out:

source string:
MarkerBeg 1
 some text
MarkerEnd 1
xxx
MarkerBeg 2
hi there :) 
MarkerEnd 2
xyz
MarkerBeg hello
zgfds
MarkerEnd hello

extracted:
1: some text
2: hi there :) 
hello: zgfds

The regex works as follows:
regex: "MarkerBeg\s+(\w+)\s+(.*)\s+MarkerEnd\s+\1" (backslashes are escaped in original code)
MarkerBeg: taken literally
\s+: one or more whitespace characters
(w+): one or more letter, number, or underscore characters, placed in a capturing group to extract later (your identifier)
\s+: as above
(.*): zero or more characters, placed in a capturing group (the contents of your text block)
\s+: as above
MarkerEnd: taken literally
\s+: as above
\1: the contents of the first capturing group (i.e. your identifier)