开发者

Is PHP's json_encode guaranteed to produce ASCII string?

开发者 https://www.devze.com 2022-12-27 00:51 出处:网络
Well, the subject says everything. I\'m using json_encode to convert some UTF8 data to JSON and I need to transfer it to some layer that is currently ASCII-only. So I wonder whether I need to make it

Well, the subject says everything. I'm using json_encode to convert some UTF8 data to JSON and I need to transfer it to some layer that is currently ASCII-only. So I wonder whether I need to make it UTF-8 aware, or can I leave it as it is.

Looking at JSON rfc, UTF8 is also valid charset in JSON output, 开发者_JAVA技巧although not recommended, i.e. some implemenatations can leave UTF8 data inside. The question is whether PHP's implementation dumps everthing as ASCII or opts to leave something as UTF-8.


Unlike JSON support in other languages, json_encode() does not have the ability to generate anything other than ASCII.


According to the JSON article in Wikipedia, Unicode characters in strings are always

double-quoted Unicode with backslash escaping

The examples in the PHP Manual on json_encode() seem to confirm this.

So any UTF-8 character outside ASCII/ANSI should be escaped like this: \u0027 (note, as @Ignacio points out in the comments, that this is the recommended way to deal with those characters, not a required one)

However, I suppose json_decode() will convert the characters back to their byte values? You may get in trouble there.

If you need to be sure, take a look at iconv() that could convert your UTF-8 String into ASCII (dropping any unsupported characters) beforehand.


Well, json_encode returns a string. According to the PHP documentation for string:

A string is series of characters. Before PHP 6, a character is the same as a byte. That is, there are exactly 256 different characters possible. This also implies that PHP has no native support of Unicode. See utf8_encode() and utf8_decode() for some basic Unicode functionality.

So for the time being you do not need to worry about making it UTF-8 aware. Of course you still might want to think about this anyway, to future-proof your code.

0

精彩评论

暂无评论...
验证码 换一张
取 消