I've got a string that is in my database like 中华武魂
when I post my request to retrieve the data via my website I'm getting the data to the server in the format %E4%B8%AD%E5%8D%8E%E6%AD%A6%E9%AD%82
What decoding steps to I have to take in 开发者_运维百科order to get it back to the usable form? While also cleaning the user input to ensure they're not going to try an SQL injection attack? (escape string before or after encoding?)
EDIT:
rawurldecode(); // returns "ä¸åŽæ¦é‚"
urldecode(); // returns "ä¸åŽæ¦é‚"
public function utf8_urldecode($str) {
$str = preg_replace("/%u([0-9a-f]{3,4})/i","&#x\\1;",urldecode($str));
return html_entity_decode($str,null,'UTF-8');
}
// returns "ä¸åŽæ¦é‚"
... which actually works when I try and use it in an SQL statement.
I think because I was doing an echo
and die();
without specifying a header of UTF-8 (thus I guess that was reading to me as latin)
Thanks for the help!
When your data is actually that percent-encoded form, you just have to call rawurldecode
:
$data = '%E4%B8%AD%E5%8D%8E%E6%AD%A6%E9%AD%82';
$str = rawurldecode($data);
This suffices as the data already is encoded in UTF-8: 中
(U+4E2D) is encoded with the byte sequence 0xE4B8AD in UTF-8 and that is encoded with %E4%B8%AD
when using the percent-encoding.
That your output does not seem to be as expected is probably because the output is interpreted with the wrong character encoding, probably Windows-1252 instead of UTF-8. Because in Windows-1252, 0xE4 represents ä
, 0xB8 represents ¸
, 0xAD represents å
, and so on. So make sure to specify the output character encoding properly.
Use PHP's urldecode: http://php.net/manual/en/function.urldecode.php
You have choices here: urldecode
or rawurldecode
.
If you had encoded your string using urlencode
, you must use urldecode
because of the way spaces are handled. While urlencode
converts spaces to +
, it is not the same with rawurlencode
.
精彩评论