开发者

Problems with matching emails with regex

开发者 https://www.devze.com 2022-12-28 10:09 出处:网络
I\'m trying to match an email address here is what I\'ve come up with so far : String text = \"gandalf_storm@mymail.com\";

I'm trying to match an email address here is what I've come up with so far :

String text = "gandalf_storm@mymail.com"; 
String regex = "(\\w+)@{1}(\\w+){2,}\\.{1}\\w{2,4}";

This however works with following cases :

gandalf_storm@mymail.com

gandalfstorm@mymail.com

gandalf2storm@mymail.com

So it matches any alphanumeric character repeated once or more that comes before one @ followed by any alphanumeric character repeated at least two times(which is minimal characters for any domain name) followed by one .(dot) and followed by any alphanumeric character repeated at least 2 times and at most 4 times(because there are domains such as .us or .mobi).

This expression however does not work with emails such as :

gandalf.storm@mymail.com gandalf.storm@mydomain.me.uk gandalf.storm@mysubdomain.mydomain.me.uk gandalf.storm@mysubdomain.mysubdomain.mydomain.me.uk etc as many subdomains

or

gandalf.storm@mymail.c开发者_运维问答om gandalf2storm@mydomain.me.uk gandalf_storm@mysubdomain.mydomain.me.uk gandalfstorm@mysubdomain.mysubdomain.mydomain.me.uk

I've just started to learn regex and I found interesting to try to solve problems such as these by using regex .. not partially but for each case, any help would be much appriciated. Thank you


This question has been asked many, many times before here on SO. Here's why you don't want to use regexes to parse email addresses. Note please that that monster of a regex doesn't even handle comments.


see this question. Particularly the answer from Good Person. This article has some code for validating an email address.


The regex you use is very restrictive :

  • Using the \w character class before the @ does not allow the . character, which explains why gandalf.storm does not match
  • In the domain part of the regex, you only allow two "words" separated with a . character, which excludes "mysubdomain.mydomain.net"

You should try to fix these to match your more complicated examples.

As a side note, when you want to match a single character, the {1} part is not mandatory.


To answer your question, as you are learning.

The problem with your regex not matching with the first lot is partly because the part before the @ does not allow the '.' character. changing to this:

 String regex = "([\\w.]+)@(\\w+){2,}\\.\\w{2,4}";

should allow gandalf.storm@mymail.com, because the [\\w.]+ says any character in the group '\w' (any character) or '.' (does not need to be escaped when part of a group, actually means a dot) 1 or more times

This might give you enough of a help to be able to figure the rest out on your own. after all that is the point of learning :)

I tested this at http://www.regexplanet.com/simple/index.html which uses the java library for the engine.

0

精彩评论

暂无评论...
验证码 换一张
取 消