On a request URL, I can get the query string ?dir=Documents%20partag%C3%A9s
or ?dir=Documents%20partag%E9s
. I think the first one is UTF-8 and the second is ASCII.
The real string is : Docume开发者_开发百科nts partagés
So, I have a PHP script (in UTF-8) and what I want to do, is to detect if the query string is ASCII or UTF-8, and if ASCII, convert it to UTF-8.
I tried with mb_
functions, but the query string is always detected as ASCII and urldecode version of query string as UTF-8.
How can I achieve this? Note that Wikipedia has a similar function -it encodes itself %E9
to %C3%A9
.
E9
is 233 in decimal. It is not a valid ASCII byte (0-127 only), but it is é
in ISO-8859-1 (Latin1). When using mb_convert_encoding
, you can specify multiple encodings (e.g.: UTF-8 and ISO-8859-1).
This should fix it:
mb_convert_encoding($str, 'UTF-8', 'UTF-8,ISO-8859-1');
With the following script:
$str1 = 'Documents%20partag%E9s';
$str2 = 'Documents%20partag%C3%A9s';
var_dump(mb_convert_encoding(urldecode($str1), 'UTF-8', 'UTF-8,ISO-8859-1'));
var_dump(mb_convert_encoding(urldecode($str2), 'UTF-8', 'UTF-8,ISO-8859-1'));
I get:
string(19) "Documents partagés"
string(19) "Documents partagés"
精彩评论