开发者

Java Regex: how to capture multiple matches in the same line

开发者 https://www.devze.com 2023-04-06 08:40 出处:网络
I am trying to match a regex pattern in Java, and I have two questions: Inside the pattern I\'m looking for there is a known beginning and then an unknown string that I want to get up until the firs

I am trying to match a regex pattern in Java, and I have two questions:

  1. Inside the pattern I'm looking for there is a known beginning and then an unknown string that I want to get up until the first occurrence of an &.
  2. there are multiple occurrences of these patterns in the line and I would like to get each occurrence separately.

For example I have this input line:

1234567 100,110,116,129,139,140,144,146 http://www.gold.com/shc/s/c_10153_12605_Computers+%26+Electronics_Televisions?filter=Screen+Refresh+Rate%7C120HZ%5EScreen+Size%7C37+in.+to+42+in.&sName=View+All&viewItems=25&subCatView=true   ISx20070515x00001a          http://www.gold.com/shc/s/c_10153_12605_Computers+%26+Electronics_Televisions?filter=Screen+Refresh+Rate%7C120HZ&sName=View+All&subCatView=true 0   2819357575609397706

And I am interested in these strings:

Screen+Refresh+Rate%7C120HZ%5EScreen+Size%7C37+in.+开发者_JAVA技巧to+42+in.

Screen+Refresh+Rate%7C120HZ


Assuming the known beginning is filter=**, the regular expression pattern (?:filter=\\*\\*)(.*?)(?:&) should get you what you need. Use Matcher.find() to get all occurrences of the pattern in a given string. Using the test string you provided, the following:

final Pattern p = Pattern.compile("(?:filter=\\*\\*)(.*?)(?:&)");
final Matcher m = p.matcher(testString);
int cnt = 0;
while (m.find()) {
    System.out.println(++cnt + ": G1: " + m.group(1));
}

Will output:

1: G1: Screen+Refresh+Rate%7C120HZ%5EScreen+Size%7C37+in.+to+42+in.
2: G1: Screen+Refresh+Rate%7C120HZ**


If i know that I might need other query parameters in the future, I think it'll be more prudent to decode and parse the URL.

String url = URLDecoder.decode("http://www.gold.com/shc/s/c_10153_12605_" +
            "Computers+%26+Electronics_Televisions?filter=Screen+Refresh+Rate" +
            "%7C120HZ%5EScreen+Size%7C37+in.+to+42+in.&sName=View+All&viewItems=25&subCatView=true"
            ,"utf-8");
Pattern amp = Pattern.compile("&");
Pattern eq = Pattern.compile("=");
Map<String, String> params = new HashMap<String, String>();
String queryString = url.substring(url.indexOf('?') + 1);
for(String param : amp.split(queryString)) {
    String[] pair = eq.split(param);
    params.put(pair[0], pair[1]);
}
for(Entry<String, String> param : params.entrySet()) {
    System.out.format("%s = %s\n", param.getKey(), param.getValue());
}

Output

subCatView = true
viewItems = 25
sName = View All
filter = Screen Refresh Rate|120HZ^Screen Size|37 in. to 42 in.


in your example, there is sometimes a "**" at the end before the "&". but basically, (assuming "filter=" is the start pattern you are looking for) you want something like:

"filter=([^&]+)&"


Using the regular expression (?<=filter=\*{0,2})[^&]*[^&*]+ in java:

Pattern p = Pattern.compile("(?<=filter=\\*{0,2})[^&]*[^&*]+");
String s = "1234567 100,110,116,129,139,140,144,146 http://www.gold.com/shc/s/c_10153_12605_Computers+%26+Electronics_Televisions?filter=**Screen+Refresh+Rate%7C120HZ%5EScreen+Size%7C37+in.+to+42+in.&sName=View+All**&viewItems=25&subCatView=true   ISx20070515x00001a          http://www.gold.com/shc/s/c_10153_12605_Computers+%26+Electronics_Televisions?filter=**Screen+Refresh+Rate%7C120HZ**&sName=View+All&subCatView=true 0   2819357575609397706";
Matcher m = p.matcher(s);
while (m.find()) {
    System.out.println(m.group());
}

EDIT:

Added [^&*]+ to the end of the regex to prevent the ** from being included in the second match.

EDIT2:

Changed regular expression to use lookbehind.


The regex you're looking for is

Screen\+Refresh\+Rate[^&]*

You could use Matcher.find() to find all matches.


are you looking for a string that follows with "filter=" and ignores the first "*" and is end with the first "&". your can try the following:

String str = "1234567 100,110,116,129,139,140,144,146 http://www.gold.com/shc/s/c_10153_12605_Computers+%26+Electronics_Televisions?filter=**Screen+Refresh+Rate%7C120HZ%5EScreen+Size%7C37+in.+to+42+in.&sName=View+All**&viewItems=25&subCatView=true   ISx20070515x00001a          http://www.gold.com/shc/s/c_10153_12605_Computers+%26+Electronics_Televisions?filter=**Screen+Refresh+Rate%7C120HZ**&sName=View+All&subCatView=true 0   2819357575609397706";
    Pattern p = Pattern.compile("filter=(?:\\**)([^&]+?)(?:\\**)&");

    Matcher matcher = p.matcher(str);
    while(matcher.find()){
        System.out.println(matcher.group(1));
    }
0

精彩评论

暂无评论...
验证码 换一张
取 消