capture text, including tags from string, and then reorder tags with text_问答_开发者

capture text, including tags from string, and then reorder tags with text

开发者 https://www.devze.com 2023-01-03 16:37 出处：网络

I have the following text: abcabcabcabc<2007-01-12><name1><2007-01-12>abcabcabcabc<name2><2007-01-11>abcabcabcabc<name3><2007-02-12>abcabcabcabc<name4>

相关专题：regex

I have the following text:

abcabcabcabc<2007-01-12><name1><2007-01-12>abcabcabcabc<name2><2007-01-11>abcabcabcabc<name3><2007-02-12>abcabcabcabc<name4>abcabcabcabc<2007-03-12><name5><date>abcabcabcabc<name6>

I need to use regular expressions in order to clean the above text:

The basic extraction rule is:

<2007-01-12>abcabcabcabc<name2>

I have no problem extracting this pattern. My issue is that within th text I have malformed sequences: If the text doesn't start with a date, and end with a name my extraction fails. For example, the text above may have several mal formed sequences, such as:

abcabcabcabc<2007-01-12><name1>

Should be:

<2007-01-12>abcabcabcabc<name1>

Is it possible to have a regular expression that would clean the above, pri开发者_如何学Pythonor to extracting my consistent pattern. In short, i need to find all mal formed patterns, and then take the date tag and put it in front of it, as provided in the example above.

Thanks.

Do you need something like this perhaps?

public class Extract {
    public static void main(String[] args) {
        String text =
            "abcabcabcabc<2007-01-12><name1>" +
            "<2007-01-12>abcabcabcxxx<name2>" +
            "<2007-01-11>abcabcabcyyy<name3>" +
            "<2007-02-12>abcabcabczzz<name4>" +
            "abcabcabc123<2007-03-12><name5>" +
            "<date>abcabcabc456<name6>";
        System.out.println(
            text.replaceAll(
                "(text)<(text)>(text)<(text)>"
                    .replace("text", "[^<]*"),
                "$1$3 - $2 - $4\n"
            )
        );
    }
}

This prints:

abcabcabcabc - 2007-01-12 - name1
abcabcabcxxx - 2007-01-12 - name2
abcabcabcyyy - 2007-01-11 - name3
abcabcabczzz - 2007-02-12 - name4
abcabcabc123 - 2007-03-12 - name5
abcabcabc456 - date - name6

Essentially, there are 3 parts: