开发者

Regex Nth match or Last Match if < N matches

开发者 https://www.devze.com 2023-03-12 06:09 出处:网络
I\'m trying to find the nth match, or the last match if there are fewer than n.n is determined within my program and the regex string is constructed with \'n\' replaced by an integer.

I'm trying to find the nth match, or the last match if there are fewer than n. n is determined within my program and the regex string is constructed with 'n' replaced by an integer.

Here is my best guess, but my repetition operator {1,n} is always matching just once. I thought it would be greedy by default

The basic regex would be:
distinctiveString[\s\S]*?value="([^"]*)"

So I modified it to this to try to get the nth one instead
(?:distinctiveString[\s\S]*?){1,n}value="([^"]*)"

distinctiveString randomStuff value="val1"
moreRandomStuff
di开发者_开发知识库stinctiveString randomStuff value="val2"
moreRandomStuff
distinctiveString randomStuff value="val3"
moreRandomStuff
distinctiveString randomStuff value="val4"
moreRandomStuff
distinctiveString randomStuff value="val5"

So in this case what I want is with n = 2 I'd get 'val2', n = 5 I'd get 'val5', n = 8 I would also get 'val5'.

I'm passing my regular expression through an application layer, but I think it's being handed directly to Perl as is.


Try something like this:

(?:(?:[\s\S]*?distinctiveString){4}[\s\S]*?|(?:[\s\S]*distinctiveString)[\s\S]*?)value="([^"]*)"

which would have "val4" in match group 1 or "val3" for the input:

distinctiveString randomStuff value="val1"
moreRandomStuff
distinctiveString randomStuff value="val2"
moreRandomStuff
distinctiveString randomStuff value="val3"

A quick break down of the pattern:

(?:                                         #
  (?:[\s\S]*?distinctiveString){4}[\s\S]*?  # match 4 'distinctiveString's
  |                                         # OR
  (?:[\s\S]*distinctiveString)[\s\S]*?      # match the last 'distinctiveString'
)                                           #
value="([^"]*)"                             #

By looking at your profile, it seems you are most active in the Java tag, so here a small Java demo:

import java.util.regex.*;

public class Main {

    private static String getNthMatch(int n, String text, String distinctive) {
        String regex = String.format(
                "(?xs)                 # enable comments and dot-all           \n" +
                "(?:                   # start non-capturing group 1           \n" +
                "  (?:.*?%s){%d}       #   match n 'distinctive' strings       \n" +
                "  |                   #   OR                                  \n" +
                "  (?:.*%s)            #   match the last 'distinctive' string \n" +
                ")                     # end non-capturing group 1             \n" +
                ".*?value=\"([^\"]*)\" # match the value                       \n",
                distinctive, n, distinctive
        );
        Matcher m = Pattern.compile(regex).matcher(text);
        return m.find() ? m.group(1) : null;
    }

    public static void main(String[] args) throws Exception {
        String text = "distinctiveString randomStuff value=\"val1\" \n" +
                "moreRandomStuff                                    \n" +
                "distinctiveString randomStuff value=\"val2\"       \n" +
                "moreRandomStuff                                    \n" +
                "distinctiveString randomStuff value=\"val3\"       \n" +
                "moreRandomStuff                                    \n" +
                "distinctiveString randomStuff value=\"val4\"       \n" +
                "moreRandomStuff                                    \n" +
                "distinctiveString randomStuff value=\"val5\"         ";

        String distinctive = "distinctiveString";

        System.out.println(getNthMatch(4, text, distinctive));
        System.out.println(getNthMatch(5, text, distinctive));
        System.out.println(getNthMatch(6, text, distinctive));
        System.out.println(getNthMatch(7, text, distinctive));
    }
}

which will print the following to the console:

val4
val5
val5
val5

Note that the . matches the same as [\s\S] when the dot-all option ((?s)) is enabled.

EDIT

Yes, {1,n} is greedy. However, when you place [\s\S]*? after distinctiveString in (?:distinctiveString[\s\S]*?){1,3}, then distinctiveString is matched and then reluctantly zero or more chars (so zero will be matched) which is then repeated between 1 and 3 times. What you want to do is move [\s\S]*? before distinctiveString:

import java.util.regex.*;

public class Main {

        private static String getNthMatch(int n, String text, String distinctive) {
            String regex = String.format(
                    "(?:[\\s\\S]*?%s){1,%d}[\\s\\S]*?value=\"([^\"]*)\"",
                    distinctive, n
            );
            Matcher m = Pattern.compile(regex).matcher(text);
            return m.find() ? m.group(1) : null;
        }

    public static void main(String[] args) throws Exception {
        String text = "distinctiveString randomStuff value=\"val1\" \n" +
                "moreRandomStuff                                    \n" +
                "distinctiveString randomStuff value=\"val2\"       \n" +
                "moreRandomStuff                                    \n" +
                "distinctiveString randomStuff value=\"val3\"       \n" +
                "moreRandomStuff                                    \n" +
                "distinctiveString randomStuff value=\"val4\"       \n" +
                "moreRandomStuff                                    \n" +
                "distinctiveString randomStuff value=\"val5\"         ";

        String distinctive = "distinctiveString";

        System.out.println(getNthMatch(4, text, distinctive));
        System.out.println(getNthMatch(5, text, distinctive));
        System.out.println(getNthMatch(6, text, distinctive));
        System.out.println(getNthMatch(7, text, distinctive));
    }
}

which also prints:

val4
val5
val5
val5
0

精彩评论

暂无评论...
验证码 换一张
取 消