开发者

how to support UTF8 (japanese, arabic, spanish, ...) URL's in PHP

开发者 https://www.devze.com 2023-01-19 08:01 出处:网络
For a web application, we need to link to some user generated content. A users types in a title for e.g. a product and we generate an SEO friendly url for that product:

For a web application, we need to link to some user generated content. A users types in a title for e.g. a product and we generate an SEO friendly url for that product:

like this

title: a nice product

www.user.com/product/a-nice-product

title: أبجد هوز开发者_开发问答

www.user.com/product/أبجد هوز

The problem is that those foreign language url's aren't supported and a browser refuses to open those links. I've seen wordpress setups support that kind of url's so I guess it's possible to do this.

Does anyone know how we should support this in php?

wikipedia handles this just fine: http://ar.wikipedia.org


Although the URL itself only allows US-ASCII characters, you can use Unicode characters in the URI path if you encode them with UTF-8 and then convert them in US-ASCII characters by using the percent-encoding:

A system that internally provides identifiers in the form of a different character encoding, such as EBCDIC, will generally perform character translation of textual identifiers to UTF-8 [STD63] (or some other superset of the US-ASCII character encoding) at an internal interface, thereby providing more meaningful identifiers than those resulting from simply percent-encoding the original octets.

So you can do something like this (assuming UTF-8):

$title = 'أبجد هوز';
$path = '/product/'.rawurlencode($title);
echo $path;  // "/product/%D8%A3%D8%A8%D8%AC%D8%AF%20%D9%87%D9%88%D8%B2"

Although the URI path is actually encoded with the percent-encoding, most modern browsers will display the characters this sequence represents in Unicode when UTF-8 is used.


You're in trouble I'm afraid. The encoding of the URL is at the discretion of the browser. I've encountered the same problem when trying to support URLs with Norwegian special characters and its simply not consistently possible.

You may be able to redirect a browser to the UTF-8 URL, but it might reply to you in ISO. It gets even worse in some cases where browsers (firefox for instance) will mix ISO and UTF-8 formatting in the same url (this happens particularly with get parameters).

My suggestion is simply; Don't do it, use either English (better SEO too!) or spell it phonetically.


You might need to use IDNA encoding on the non-ASCII portion of the URL.

http://en.wikipedia.org/wiki/Internationalized_domain_name


You should do urlencode the Arabic or unicode text

urlencode('كلام-عربي')

And its very important to add the charset code to the head tag of the page, otherwise the link will not work

<meta charset="utf-8">
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号