开发者

Java regex patterns

开发者 https://www.devze.com 2023-04-10 03:58 出处:网络
I need help with this matter. Look at the following regex: Pattern pattern = Patte开发者_如何学编程rn.compile(\"[A-Za-z]+(\\\\-[A-Za-z]+)\");

I need help with this matter. Look at the following regex:

Pattern pattern = Patte开发者_如何学编程rn.compile("[A-Za-z]+(\\-[A-Za-z]+)");
Matcher matcher = pattern.matcher(s1);

I want to look for words like this: "home-made", "aaaa-bbb" and not "aaa - bbb", but not "aaa--aa--aaa". Basically, I want the following:

word - hyphen - word.

It is working for everything, except this pattern will pass: "aaa--aaa--aaa" and shouldn't. What regex will work for this pattern?


Can can remove the backslash from your expression:

"[A-Za-z]+-[A-Za-z]+"

The following code should work then

Pattern pattern = Pattern.compile("[A-Za-z]+-[A-Za-z]+");
Matcher matcher = pattern.matcher("aaa-bbb");
match = matcher.matches();

Note that you can use Matcher.matches() instead of Matcher.find() in order to check the complete string for a match.

If instead you want to look inside a string using Matcher.find() you can use the expression

"(^|\\s)[A-Za-z]+-[A-Za-z]+(\\s|$)"

but note that then only words separated by whitespace will be found (i.e. no words like aaa-bbb.). To capture also this case you can then use lookbehinds and lookaheads:

"(?<![A-Za-z-])[A-Za-z]+-[A-Za-z]+(?![A-Za-z-])"

which will read

(?<![A-Za-z-])        // before the match there must not be and A-Z or -
[A-Za-z]+             // the match itself consists of one or more A-Z
-                     // followed by a -
[A-Za-z]+             // followed by one or more A-Z
(?![A-Za-z-])         // but afterwards not by any A-Z or -

An example:

Pattern pattern = Pattern.compile("(?<![A-Za-z-])[A-Za-z]+-[A-Za-z]+(?![A-Za-z-])");
Matcher matcher = pattern.matcher("It is home-made.");
if (matcher.find()) {
    System.out.println(matcher.group());    // => home-made
}


Actually I can't reproduce the problem mentioned with your expression, if I use single words in the String. As cleared up with the discussion in the comments though, the String s contains a whole sentence to be first tokenised in words and then matched or not.

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegExp {

        private static void match(String s) {
                Pattern pattern = Pattern.compile("[A-Za-z]+(\\-[A-Za-z]+)");
                Matcher matcher = pattern.matcher(s);
                if (matcher.matches()) {
                        System.out.println("'" + s + "' match");
                } else {
                        System.out.println("'" + s + "' doesn't match");
                }
        }

        /**
        * @param args
        */
        public static void main(String[] args) {
                match(" -home-made");
                match("home-made");
                match("aaaa-bbb");
                match("aaa - bbb");
                match("aaa--aa--aaa");
                match("home--home-home");
        }

}

The output is:

' -home-made' doesn't match
'home-made' match
'aaaa-bbb' match
'aaa - bbb' doesn't match
'aaa--aa--aaa' doesn't match
'home--home-home' doesn't match
0

精彩评论

暂无评论...
验证码 换一张
取 消