开发者

Regular expression matching "dictionary words"

开发者 https://www.devze.com 2023-04-12 13:27 出处:网络
I\'m a Java user but I\'m new to regular expressions. I just want to have a tiny expression that, given a word (we assume that the string is only one word), answers with a boolean, telling if the wor

I'm a Java user but I'm new to regular expressions.

I just want to have a tiny expression that, given a word (we assume that the string is only one word), answers with a boolean, telling if the word is valid or not.

An example... I want to catch all words that is plausible to be in a dictionary... So, i just want words with chars from a-z A-Z, an hyphen (for example: man-in-the-middle) and an apostrophe (like I'll or开发者_开发技巧 Tiffany's).

Valid words:

  • "food"
  • "RocKet"
  • "man-in-the-middle"
  • "kahsdkjhsakdhakjsd"
  • "JESUS", etc.

Non-valid words:

  • "gipsy76"
  • "www.google.com"
  • "me@gmail.com"
  • "745474"
  • "+-x/", etc.

I use this code, but it won't gave the correct answer:

Pattern p = Pattern.compile("[A-Za-z&-&']");
Matcher m = p.matcher(s);
System.out.println(m.matches());

What's wrong with my regex?


  • Add a + after the expression to say "one or more of those characters":
  • Escape the hyphen with \ (or put it last).
  • Remove those & characters:

Here's the code:

Pattern p = Pattern.compile("[A-Za-z'-]+");
Matcher m = p.matcher(s);
System.out.println(m.matches());

Complete test:

String[] ok = {"food","RocKet","man-in-the-middle","kahsdkjhsakdhakjsd","JESUS"};
String[] notOk = {"gipsy76", "www.google.com", "me@gmail.com", "745474","+-x/" };

Pattern p = Pattern.compile("[A-Za-z'-]+");

for (String shouldMatch : ok)
    if (!p.matcher(shouldMatch).matches())
        System.out.println("Error on: " + shouldMatch);

for (String shouldNotMatch : notOk)
    if (p.matcher(shouldNotMatch).matches())
        System.out.println("Error on: " + shouldNotMatch);

(Produces no output.)


This should work:

"[A-Za-z'-]+"


But "-word" and "word-" are not valid. So you can uses this pattern:

WORD_EXP = "^[A-Za-z]+(-[A-Za-z]+)*$"


Regex - /^([a-zA-Z]*('|-)?[a-zA-Z]+)*/

You can use above regex if you don't want successive "'" or "-". It will give you accurate matching your text. It accepts man-in-the-middle asd'asdasd'asd

It rejects following string man--in--midle asdasd''asd


Hi Aloob please check with this, Bit lengthy, might be having shorter version of this, Still...

[A-z]*||[[A-z]*[-]*]*||[[A-z]*[-]*[']*]*
0

精彩评论

暂无评论...
验证码 换一张
取 消