开发者

Utf-8 in subdomain?

开发者 https://www.devze.com 2023-02-17 17:16 出处:网络
Is it possible to use 开发者_如何学编程UTF-8 in a subdomain? If so, which characters are allowed and how does the can\'t-mix-encodings thing work?

Is it possible to use 开发者_如何学编程UTF-8 in a subdomain? If so, which characters are allowed and how does the can't-mix-encodings thing work?

I've tried to RTFM, but Google wan't of much help


There aren't many things special about subdomains. A given domain name foo.example.com is an ordered list of labels (foo, example, com). So you might want to know if you can use UTF-8 in a given label.

The low level answer is that a label is defined as:

<label> ::= <letter> [ [ <ldh-str> ] <let-dig> ]
<let-dig> ::= <letter> | <digit>
<letter> ::= any one of the 52 alphabetic characters A through Z in upper case and a through z in lower case
<digit> ::= any one of the ten digits 0 through 9
<ldh-str> ::= <let-dig-hyp> | <let-dig-hyp> <ldh-str>
<let-dig-hyp> ::= <let-dig> | "-"

which means that you can only find [-a-zA-Z0-9] in a label.

However, IDNA can be used to encode Unicode characters. In short, a label containing other characters is encoded with: "xn--" + punycode(nameprep(label)).

As for limitations at least:

  • for characters can't be in a IDN label (U+002E, U+3002, U+FF0E, U+FF61).
0

精彩评论

暂无评论...
验证码 换一张
取 消