开发者

Can HTTP URIs have non-ASCII characters?

开发者 https://www.devze.com 2022-12-24 22:55 出处:网络
I tried to find this in the relevant RFC, IETF RFC 3986, but couldn\'t figure it. Do URIs for HTTP allow Unicode, or开发者_JAVA技巧 non-ASCII of any kind?

I tried to find this in the relevant RFC, IETF RFC 3986, but couldn't figure it.

Do URIs for HTTP allow Unicode, or开发者_JAVA技巧 non-ASCII of any kind?

Can you please cite the section and the RFC that supports your answer.

NB: For those who might think this is not programming related - it is. It's related to an ISAPI filter I'm building.


Addendum

I've read section 2.5 of RFC 3986. But RFC 2616, which I believe is the current HTTP protocol, predates 3986, and for that reason I'd suppose it cannot be compliant with 3986. Furthermore, even if or when the HTTP RFC is updated, there still will be the issue of rationalization - in other words, does an HTTP URI support ALL of the RFC3986 provisos, including whatever is appropriate to include non US-ASCII characters?


http://en.wikipedia.org/wiki/Internationalized_domain_name


No, they are not allowed. Just check the ABNF in RFC 3986.


Here is an example: ☃.net.

In terms of the relevant section of RFC 3986, I think you are looking at 2.5.

EDIT:

Apparently stack overflow doesn't detect this as a proper URL. You'll have to copy&paste into your browser.


Used to be that non english characters were not allowed in DNS and URL/URI. There was a hack to allow them by using % encoding in URI. However many countries such us russia and china are starting to implement DNS using non latin characters. Here is a reference to one of these standards


RFC 3986 is being replaced with RFC 3987, which fully supports Unicode, and provides mappings rules to/from RFC 3986 style URIs.


Many browsers are not support URIs with Unicode characters (I've implemented them on a website I've build called -- blogvani.com) and Google duly scans and keeps them intact. I don't think that works on top-level domains though, at least not with the registrar and not directly.

For top-level domains if you have a domain registered in Unicode (for example people can register domains in Hindi), it will be converted to a corresponding code in ASCII (something that may go like jdhfks3243-32434.com)...

It is quite funny to see how this is routed and to realize that you're not actually going to a unicode domain even though it seems like that.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号