开发者

Can anyone tell me how to convert UTF-8 value to UCS-2 value in Objective-c?

开发者 https://www.devze.com 2023-03-26 14:11 出处:网络
I am trying to convert UTF-8 string into UCS-2 string. I need to get string like \"\\uFF0D\\uFF0D\\u6211\\u7684\\u4E0A\\u7F51\\u4E3B\\u9875\".

I am trying to convert UTF-8 string into UCS-2 string. I need to get string like "\uFF0D\uFF0D\u6211\u7684\u4E0A\u7F51\u4E3B\u9875". I have googled for about a month by now, but still there is no reference about converting UTF-8 to UCS-2. Please someone help me. Thx in advance.

EDIT: okay, maybe my explanation was not good enough. Here is what I am trying to do. I live in Korea, and I am trying to send a sms message using CTMessageCenter. I tried to send chinese simplified character through my app. And I get ???? Instead of proper characters. So I tried UTF开发者_如何转开发-8, UTF-16, BE and LE as well. But they all return ??. Finally I found out that SMS uses UCS-2 and EUC-KR encoding in Korea. Weird, isn't it? Anyway I tried to send string like \u4E3B\u9875 and it worked. So I need to convert string into UCS-2 encoding first and get the string literal from those strings.


Wikipedia:

The older UCS-2 (2-byte Universal Character Set) is a similar character encoding that was superseded by UTF-16 in version 2.0 of the Unicode standard in July 1996.2 It produces a fixed-length format by simply using the code point as the 16-bit code unit and produces exactly the same result as UTF-16 for 96.9% of all the code points in the range 0-0xFFFF, including all characters that had been assigned a value at that time.

IBM:

Since the UCS-2 standard is limited to 65,535 characters, and the data processing industry needs over 94,000 characters, the UCS-2 standard is in the process of being superseded by the Unicode UTF-16 standard.

However, because UTF-16 is a superset of the existing UCS-2 standard, you can develop your applications using the systems existing UCS-2 support as long as your applications treat the UCS-2 as if it were UTF-16.

uincode.org:

UCS-2 is obsolete terminology which refers to a Unicode implementation up to Unicode 1.1, before surrogate code points and UTF-16 were added to Version 2.0 of the standard. This term should now be avoided.

UCS-2 does not define a distinct data format, because UTF-16 and UCS-2 are identical for purposes of data exchange. Both are 16-bit, and have exactly the same code unit representation.

So, using the "UTF8toUnicode" transformation in most language libraries will produce UTF-16, which is essentially UCS-2. And simply extracting the 16-bit characters from an Objective-C string will accomplish the same thing.

In other words, the solution has been staring you in the face all along.


UCS-2 is not a valid Unicode encoding. UTF-8 is.

It is therefore impossible to convert UTF-8 into UCS-2 — and indeed, also the reverse.

UCS-2 is dead, ancient history. Let it rot in peace.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号