I'm trying to find the nth match, or the last match if there are fewer than n. n is determined within my program and the regex string is constructed with 'n' replaced by an integer.
Here is my best guess, but my repetition operator {1,n} is always matching just once. I thought it would be greedy by default
The basic regex would be:
distinctiveString[\s\S]*?value="([^"]*)"
So I modified it to this to try to get the nth one instead
(?:distinctiveString[\s\S]*?){1,n}value="([^"]*)"
distinctiveString randomStuff value="val1"
moreRandomStuff
di开发者_开发知识库stinctiveString randomStuff value="val2"
moreRandomStuff
distinctiveString randomStuff value="val3"
moreRandomStuff
distinctiveString randomStuff value="val4"
moreRandomStuff
distinctiveString randomStuff value="val5"
So in this case what I want is with n = 2 I'd get 'val2', n = 5 I'd get 'val5', n = 8 I would also get 'val5'.
I'm passing my regular expression through an application layer, but I think it's being handed directly to Perl as is.
Try something like this:
(?:(?:[\s\S]*?distinctiveString){4}[\s\S]*?|(?:[\s\S]*distinctiveString)[\s\S]*?)value="([^"]*)"
which would have "val4"
in match group 1 or "val3"
for the input:
distinctiveString randomStuff value="val1"
moreRandomStuff
distinctiveString randomStuff value="val2"
moreRandomStuff
distinctiveString randomStuff value="val3"
A quick break down of the pattern:
(?: #
(?:[\s\S]*?distinctiveString){4}[\s\S]*? # match 4 'distinctiveString's
| # OR
(?:[\s\S]*distinctiveString)[\s\S]*? # match the last 'distinctiveString'
) #
value="([^"]*)" #
By looking at your profile, it seems you are most active in the Java tag, so here a small Java demo:
import java.util.regex.*;
public class Main {
private static String getNthMatch(int n, String text, String distinctive) {
String regex = String.format(
"(?xs) # enable comments and dot-all \n" +
"(?: # start non-capturing group 1 \n" +
" (?:.*?%s){%d} # match n 'distinctive' strings \n" +
" | # OR \n" +
" (?:.*%s) # match the last 'distinctive' string \n" +
") # end non-capturing group 1 \n" +
".*?value=\"([^\"]*)\" # match the value \n",
distinctive, n, distinctive
);
Matcher m = Pattern.compile(regex).matcher(text);
return m.find() ? m.group(1) : null;
}
public static void main(String[] args) throws Exception {
String text = "distinctiveString randomStuff value=\"val1\" \n" +
"moreRandomStuff \n" +
"distinctiveString randomStuff value=\"val2\" \n" +
"moreRandomStuff \n" +
"distinctiveString randomStuff value=\"val3\" \n" +
"moreRandomStuff \n" +
"distinctiveString randomStuff value=\"val4\" \n" +
"moreRandomStuff \n" +
"distinctiveString randomStuff value=\"val5\" ";
String distinctive = "distinctiveString";
System.out.println(getNthMatch(4, text, distinctive));
System.out.println(getNthMatch(5, text, distinctive));
System.out.println(getNthMatch(6, text, distinctive));
System.out.println(getNthMatch(7, text, distinctive));
}
}
which will print the following to the console:
val4 val5 val5 val5
Note that the .
matches the same as [\s\S]
when the dot-all option ((?s)
) is enabled.
EDIT
Yes, {1,n}
is greedy. However, when you place [\s\S]*?
after distinctiveString
in (?:distinctiveString[\s\S]*?){1,3}
, then distinctiveString
is matched and then reluctantly zero or more chars (so zero will be matched) which is then repeated between 1 and 3 times. What you want to do is move [\s\S]*?
before distinctiveString
:
import java.util.regex.*;
public class Main {
private static String getNthMatch(int n, String text, String distinctive) {
String regex = String.format(
"(?:[\\s\\S]*?%s){1,%d}[\\s\\S]*?value=\"([^\"]*)\"",
distinctive, n
);
Matcher m = Pattern.compile(regex).matcher(text);
return m.find() ? m.group(1) : null;
}
public static void main(String[] args) throws Exception {
String text = "distinctiveString randomStuff value=\"val1\" \n" +
"moreRandomStuff \n" +
"distinctiveString randomStuff value=\"val2\" \n" +
"moreRandomStuff \n" +
"distinctiveString randomStuff value=\"val3\" \n" +
"moreRandomStuff \n" +
"distinctiveString randomStuff value=\"val4\" \n" +
"moreRandomStuff \n" +
"distinctiveString randomStuff value=\"val5\" ";
String distinctive = "distinctiveString";
System.out.println(getNthMatch(4, text, distinctive));
System.out.println(getNthMatch(5, text, distinctive));
System.out.println(getNthMatch(6, text, distinctive));
System.out.println(getNthMatch(7, text, distinctive));
}
}
which also prints:
val4 val5 val5 val5
精彩评论