I'm trying to mimic the json_encode
bitmask flags implemented in PHP 5.3.0, here is the string I have:
$s = addslashes('O\'Rei"lly'); // O\'Rei\"lly
Doing json_encode($s, JSON_HEX_APOS | JSON_HEX_QUOT)
outputs the following:
"O\\\u0027Rei\\\u0022lly"
And I'm currently doing this in PHP versions older than 5.3.0:
str_replace(arr开发者_如何学JAVAay('\\"', "\\'"), array('\\u0022', '\\\u0027'), json_encode($s))
or
str_replace(array('\\"', '\\\''), array('\\u0022', '\\\u0027'), json_encode($s))
Which correctly outputs the same result:
"O\\\u0027Rei\\\u0022lly"
I'm having trouble understanding why do I need to replace single quotes ('\\\''
or even "\\'"
[surrounding quotes excluded]) with '\\\u0027'
and not just '\\u0027'
.
Here is the code that I'm having trouble porting to PHP < 5.3:
if (get_magic_quotes_gpc() && version_compare(PHP_VERSION, '6.0.0', '<'))
{
/* JSON_HEX_APOS and JSON_HEX_QUOT are availiable */
if (version_compare(PHP_VERSION, '5.3.0', '>=') === true)
{
$_GET = json_encode($_GET, JSON_HEX_APOS | JSON_HEX_QUOT);
$_POST = json_encode($_POST, JSON_HEX_APOS | JSON_HEX_QUOT);
$_COOKIE = json_encode($_COOKIE, JSON_HEX_APOS | JSON_HEX_QUOT);
$_REQUEST = json_encode($_REQUEST, JSON_HEX_APOS | JSON_HEX_QUOT);
}
/* mimic the behaviour of JSON_HEX_APOS and JSON_HEX_QUOT */
else if (extension_loaded('json') === true)
{
$_GET = str_replace(array(), array('\\u0022', '\\u0027'), json_encode($_GET));
$_POST = str_replace(array(), array('\\u0022', '\\u0027'), json_encode($_POST));
$_COOKIE = str_replace(array(), array('\\u0022', '\\u0027'), json_encode($_COOKIE));
$_REQUEST = str_replace(array(), array('\\u0022', '\\u0027'), json_encode($_REQUEST));
}
$_GET = json_decode(stripslashes($_GET));
$_POST = json_decode(stripslashes($_POST));
$_COOKIE = json_decode(stripslashes($_COOKIE));
$_REQUEST = json_decode(stripslashes($_REQUEST));
}
The PHP string
'O\'Rei"lly'
is just PHP's way of getting the literal value
O'Rei"lly
into a string which can be used. Calling addslashes
on that string changes it to be literally the following 11 characters
O\'Rei\"lly
i.e. strlen(addslashes('O\'Rei"lly')) == 11
This is the value which is being sent to json_escape
.
In JSON backslash is an escape character, so that needs to be escaped, i.e.
\
to be \\
Also single and double quotes can cause problems. So converting them to their unicode equivalent in one way to avoid problems. So later verions of PHP's json_encode change
'
to be \u0027
and
"
to be \u0022
So applying these three rules to
O\'Rei\"lly
gives us
O\\\u0027Rei\\\u0022lly
This string is then wrapped in double quotes to make it a JSON string. Your replace expressions include the leading forward slashes. Either by accident or on purpose this means that the leading and trailing double quote returned by json_encode
is not subject to the escaping, which it shouldn't be.
So in earlier versions of PHP
$s = addslashes('O\'Rei"lly');
print json_encode($s);
would print
"O\\'Rei\\\"lly"
and we want to change '
to be \u0027
and we want to change \"
to be \u0022
because the \
in \"
is just to get the "
into the string because it begins and ends with double-quotes.
So that's why we get
"O\\\u0027Rei\\\u0022lly"
It's escaping the backslash as well as the quote. It's difficult dealing with escaped escapes, as you're doing here, as it quickly turns into backslash counting games. :-/
If I understand correctly, you just want to know why you need to use
'\\\u0027'
and not just'\\u0027'
You're escaping the slash and the character unicode value. With this you are telling json that it should put an apostrophe there, but it needs the backslash and the u to know that a unicode hexadecimal character code is next.
Since you are escaping this string:
$s = addslashes('O\'Rei"lly'); // O\'Rei\"lly
the first backslash is actually escaping the backslash before the apostrophe. Then next slash is used to escape the backslash used by json to identify the character as a unicode character.
If you were appplying the algorythm to O'Reilly instead of O\'Rei\"lly then the latter would suffice.
I hope you find this useful. I only leave you this link so you can read more on how json is constructed, since its obvious you already understand PHP:
http://www.json.org/fatfree.html
When you encode a string for json, some things have to be escaped regardless of the options. As others have pointed out, that includes '\' so any backslash run through json_encode will be doubled. Since you are first running your string through addslashes, which also adds backslashes to quotes, you are adding a lot of extra backslashes. The following function will emulate how json_encode would encode a string. If the string has already had backslashes added, they will be doubled.
function json_encode_string( $encode , $options ) {
$escape = '\\\0..\37';
$needle = array();
$replace = array();
if ( $options & JSON_HEX_APOS ) {
$needle[] = "'";
$replace[] = '\u0027';
} else {
$escape .= "'";
}
if ( $options & JSON_HEX_QUOT ) {
$needle[] = '"';
$replace[] = '\u0022';
} else {
$escape .= '"';
}
if ( $options & JSON_HEX_AMP ) {
$needle[] = '&';
$replace[] = '\u0026';
}
if ( $options & JSON_HEX_TAG ) {
$needle[] = '<';
$needle[] = '>';
$replace[] = '\u003C';
$replace[] = '\u003E';
}
$encode = addcslashes( $encode , $escape );
$encode = str_replace( $needle , $replace , $encode );
return $encode;
}
Since you are going to json_encode
the string \'
you will have to encode first the \
then the '
. So you will have \\
and \u0027
. Concatenating these results \\\u0027
.
The \
generated by addslashes()
get re-escaped by json_encode()
. You probably meant to say this Doing json_encode($s, JSON_HEX_APOS | JSON_HEX_QUOT) outputs the following
but you used $str
instead of $s
, which confused everyone.
If you evaluate the string "O\\\u0027Rei\\\u0022lly"
in JavaScript, you will get "O\'rei\"lly"
and I am pretty sure that's not what you want. When you evaluate it, you probably need all the control codes removed. Go ahead, poke this in a file: alert("O\\\u0027Rei\\\u0022lly")
.
Conclusion: You are escaping the quotes twice, which is most likely not what you need. json_encode
already escapes everything that is needed so that any JavaScript parser would return the original data structure. In your case, that is the string you have obtained after the call to addslashes
.
Proof:
<?php $out = json_encode(array(10, "h'ello", addslashes("h'ello re-escaped"))); ?>
<script type="text/javascript">
var out = <?php echo $out; ?>;
alert(out[0]);
alert(out[1]);
alert(out[2]);
</script>
精彩评论