Is it possible to use 开发者_如何学编程UTF-8 in a subdomain? If so, which characters are allowed and how does the can't-mix-encodings thing work?
I've tried to RTFM, but Google wan't of much help
There aren't many things special about subdomains. A given domain name foo.example.com
is an ordered list of labels (foo
, example
, com
). So you might want to know if you can use UTF-8 in a given label.
The low level answer is that a label is defined as:
<label> ::= <letter> [ [ <ldh-str> ] <let-dig> ]
<let-dig> ::= <letter> | <digit>
<letter> ::= any one of the 52 alphabetic characters A through Z in upper case and a through z in lower case
<digit> ::= any one of the ten digits 0 through 9
<ldh-str> ::= <let-dig-hyp> | <let-dig-hyp> <ldh-str>
<let-dig-hyp> ::= <let-dig> | "-"
which means that you can only find [-a-zA-Z0-9]
in a label.
However, IDNA can be used to encode Unicode characters. In short, a label containing other characters is encoded with: "xn--" + punycode(nameprep(label))
.
As for limitations at least:
- for characters can't be in a IDN label (U+002E, U+3002, U+FF0E, U+FF61).
精彩评论