I need to ma开发者_开发百科ke sure email is valid . And also I need to check that there is no weired UTF charactes in it . I am dont with validating it for validation with regular expression
^(([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5}){1,25})+([;.](([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5}){1,25})+)*$
But how do I check it for UTF ?
Thanks
Are you trying to make sure it's valid w.r.t. RFC 5335 section 4.1?
If so, you can only check that a byte[]
is valid UTF-8. It doesn't make sense to try to verify a sequence of UTF-16 code units (Java char
s) or code-points is valid UTF-8 since UTF-8 is a byte-string to codepoint-string mapping and UTF-16 is a mapping from code-units->unicode-scalar-values. Section 3.9 of "Unicode Encoding Forms" explains all this.
The best way to tell whether a byte[]
is a well formed UTF-8 sequence is to use one of the built in decoders, e.g. StandardCharsets.UTF8
or the Guava equivalent Charsets.UTF8
.
If you want to make sure there are only ASCII characters in your email address, you can use that pattern:
"[^\\x00-\\x7F]"
It will mach any non-ascii character.
精彩评论