I'm making some site which must be fully uni开发者_JAVA百科code. Database etc are working, i only have some small logic error. Im testing my register form with ajax if fields are valid, in email field i check with regular expressions.
However if a user has a email address like 日本人@日人日本人.com it isn't coming trough.
- This type of mail addresses exist?
Are email addresses always like this? (a-z A-Z 0-9) @ (a-z A-Z 0-9).(a-z A-Z 0-9)
As per RFC 5322 ("Internet Message Format"), section 3.4.1 ("Addr-Spec Specification") you can't use non US-ASCII characters such as those you've listed. However, characters such as...
! # $ % & ' * + - / = ? ^ _ { | } ~
...are legal, as well as the full stop/period character as long as there's only one in a row.
For more information see the above RFC and indeed the Wikipedia article on email addresses, specifically the "syntax" section.
UPDATE
There's also a newer, albeit experimental, RFC 5336 (now obsoleted by RFC6531) which handles the now legitimate international domains containing UTF-8 characters, etc.
You must be very careful when you try to match/validate email addresses on a regex. In some cases you reject email addresses which however are valid. Basically its:
Show me one regex and I show you one email which doesn't match.
For that reason if I check email addresses I use a very simple regex like .+@.+(\..+)*
(user part anything, host part got at least one dot). Anything else results in false positives and false negatives.
Its better not to match email addresses (only check trivial stuff like "@") but instead send opt-in emails instead.
Usually address are in the form
[_a-zA-Z0-9]+(\.[_a-zA-Z0-9]+)*@[_a-zA-Z0-9]+(\.[_a-zA-Z0-9]+)+
on in other words \w+(\.\w+)*@\w+(\.\w+)+
. Also this site have useful information about email address patterns:
http://www.regular-expressions.info/email.html
Seems like not that many people mentioned the existence issue. People before me have given beautiful regular expressions, so I won't repeat those.
I don't know much about the Japanese side, but at least as a native Chinese speaker who has been using Chinese for main Internet browsing language, I have never seen email address in Chinese. There was once a while domains with Chinese characters are popular. But I believe it was accomplished at the DNS side and it was a commercial bubble. Now you can rarely seen domains with Chinese character in real use any more. So are the Email addresses.
Many years have passed since original question. If you want copy-paste and actually working good answer, use one provided here https://emailregex.com/
It handles many edge cases and also some not. If you want to catch all edge cases, for example completely valid "@v@"@example.com
, you need to make regex even longer. Example of regex taken from website above in my C++ code looks like this:
std::regex(
"(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\\.[a-z0-9!#$%&'*+/"
"=?^_`{|}~-]+)*|\"(?:[\\x01-\\x08\\x0b\\x0c\\x0e-\\x1f\\x21\\x23-"
"\\x5b\\x5d-\\x7f]|\\\\[\\x01-\\x09\\x0b\\x0c\\x0e-\\x7f])*\")@(?:"
"(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\\.)+[a-z0-9](?:[a-z0-9-]*[a-"
"z0-9])?|\\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:"
"25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:["
"\\x01-\\x08\\x0b\\x0c\\x0e-\\x1f\\x21-\\x5a\\x53-\\x7f]|\\\\["
"\\x01-\\x09\\x0b\\x0c\\x0e-\\x7f])+)\\])",
std::regex::nosubs | std::regex::ECMAScript | std::regex::icase)
精彩评论