Is it possible for a valid URL to contain non-escape开发者_C百科d Unicode characters?
Yes, the subset of ASCII (and therefore Unicode) that is allowed unescaped in URIs, such as letters and numbers. But the majority of the Unicode character set has to be percent-encoded.
URI
and URL
do not natively support unescaped non-ASCII Unicode characters, however many servers do allow percent-encoded UTF-8 or localized Ansi octets to be used (but no way of specifying which is actually used). For standardized native Unicode handling, use IRI
instead, which is the new protocol that replaces URI
/URL
. It requires UTF-8 encoding for everything, and provides rules for how to convert between IRI
and URI
.
精彩评论