开发者

Detect URL query string encoding

开发者 https://www.devze.com 2023-02-07 06:35 出处:网络
On a request URL, I can get the query string ?dir=Documents%20partag%C3%A9s or ?dir=Documents%20partag%E9s. I think the first one is UTF-8 and the second is ASCII.

On a request URL, I can get the query string ?dir=Documents%20partag%C3%A9s or ?dir=Documents%20partag%E9s. I think the first one is UTF-8 and the second is ASCII.

The real string is : Docume开发者_开发百科nts partagés

So, I have a PHP script (in UTF-8) and what I want to do, is to detect if the query string is ASCII or UTF-8, and if ASCII, convert it to UTF-8.

I tried with mb_ functions, but the query string is always detected as ASCII and urldecode version of query string as UTF-8.

How can I achieve this? Note that Wikipedia has a similar function -it encodes itself %E9 to %C3%A9.


E9 is 233 in decimal. It is not a valid ASCII byte (0-127 only), but it is é in ISO-8859-1 (Latin1). When using mb_convert_encoding, you can specify multiple encodings (e.g.: UTF-8 and ISO-8859-1).

This should fix it:

mb_convert_encoding($str, 'UTF-8', 'UTF-8,ISO-8859-1');

With the following script:

$str1 = 'Documents%20partag%E9s';
$str2 = 'Documents%20partag%C3%A9s';
var_dump(mb_convert_encoding(urldecode($str1), 'UTF-8', 'UTF-8,ISO-8859-1'));
var_dump(mb_convert_encoding(urldecode($str2), 'UTF-8', 'UTF-8,ISO-8859-1'));

I get:

string(19) "Documents partagés"
string(19) "Documents partagés"
0

精彩评论

暂无评论...
验证码 换一张
取 消