开发者

Extract data and format them using RegEx

开发者 https://www.devze.com 2023-01-25 08:15 出处:网络
I\'m having three strings that I have to glue together. I have an input string (string 1), which I have to run a regex (which has groups) on (string 2) and extract these groups to put them in a templ

I'm having three strings that I have to glue together.

I have an input string (string 1), which I have to run a regex (which has groups) on (string 2) and extract these groups to put them in a template (string 3) using backreferences.

A short example could be :

input: "foo1234bar5678"
regex: ".*?(\\d*).*?(\\d*).*"
template: "answer: $1 $2"

which should be expanded in "answer: 1234 5678".

I have been using java.util.regex.Pattern, but I can't figure out a way to do this with matchers. Obviously, replaceAll is not the expected behaviour, nor is append*.

Is there a way to do this nicely using the android API ?

EDIT: Here is a basic implementation :

public static String genOutput(String regex, String input, String template) {
    Pattern p = Pattern.compile(regex);
    Matcher m = p.matcher(input);
    if (m.find()) {
         for (int i = 1; i <= m.groupCount开发者_JAVA百科(); i++) {
             template = template.replaceAll("\\$" + i, m.group(i));
         }
    }
        return template;
}


Here is how I would do it:

Pattern p = Pattern.compile("(?:\\D*(\\d*)\\D*)+");
Matcher m = p.matcher(input);
if (m.find()) {
    String result = "answer: ";
    for (int i = 1; i < m.groupCount(); i++) {
        result += m.group(i) + " ";
    }
    System.out.println(result);
} else {
    System.out.println("Input did not match");
}

This will match your string, and then use the two groups as input to the String formatter.


.*(\d*).*(\d*).*

Your problem is that your regex includes repeaters but no characters for them to repeat...

The above regex will do what you want.


Digging in the libcore of android, I have found a private method, appendEvaluated, in java.util.Matcher, which does the job. So I did a copy/paste of it in my code.

Here it is :

private void appendEvaluated(StringBuffer buffer, String s) {
    boolean escape = false;
    boolean dollar = false;

    for (int i = 0; i < s.length(); i++) {
        char c = s.charAt(i);
        if (c == '\\' && !escape) {
            escape = true;
        } else if (c == '$' && !escape) {
            dollar = true;
        } else if (c >= '0' && c <= '9' && dollar) {
            buffer.append(group(c - '0'));
            dollar = false;
        } else {
            buffer.append(c);
            dollar = false;
            escape = false;
        }
    }

    // This seemingly stupid piece of code reproduces a JDK bug.
    if (escape) {
        throw new ArrayIndexOutOfBoundsException(s.length());
    }
}


Following modification will help you too:

.*?(\d*).*?(\d*).*

The question mark means that regex should match minimal number of characters. Otherwise it matches maximum, so .* matches the whole string.

And obviously do not forget that back slashes must be duplicate when you are in Java: one back slash for Java, the next one for regex, i.e. Pattern.compile(".*?(\\d*).*?(\\d*).*");

0

精彩评论

暂无评论...
验证码 换一张
取 消