开发者

Convert utf-8 std::string to std::wstring on iPhone

开发者 https://www.devze.com 2023-01-14 12:07 出处:网络
I have a UTF-8 string (created an std::string from a byte array) I understand that the encoding means that the size()/length() won\'t give me the actual number of glyphs if the text is chinese for in开

I have a UTF-8 string (created an std::string from a byte array) I understand that the encoding means that the size()/length() won't give me the actual number of glyphs if the text is chinese for in开发者_如何学JAVAstance... I understand that in order to get the unicode character code of each glyph I need to convert it to wstring (or any UTF>8 representation) and then I can get the value that will represent what I want.

I've looked around and haven't found any simple way to do it with std c++. What am I missing?

I'm compiling gcc 4+ on Apple's iPhone using cocoa-touch framework.


To get the number of utf8 'characters/code points' in a std::string you could do this : Traverse the string, if the char is between 0 and 127, it's a one byte character, between 194 and 223 it's a 2 bytes character (so advance in consequence), between 224 and 239 it's a 3 bytes character (so advance in consequence), between 240 and 244 it's a 4 bytes character (so advance in consequence).

Since wchar_t on the Iphone is, I guess, 32bits, if you really want a wstring you could use UTF8CPP to convert to UTF32. UTF8CPP could also give you the code points of your string.

But I don't understand why you're using C++ for the Iphone ? Look here : Objective-C Tuesdays: wide character strings


First of all, even if you convert your UTF-8 string to UTF-32 (and store it in wstring) it does not mean each wchar_t will correspond to a single glyph. See this text for some of the issues: http://www.unicode.org/reports/tr15/ .

Having said that, if you really need to convert a UTF-8 encoded string to UTF-32, you can use UTF-8 CPP library like this:

wstring utf32result;
utf8::utf8to32(utf8string.begin(), utf8string.end(), back_inserter(utf32result));


Boost provides a UTF-8 codecvt facet. You should be able to invoke it directly to perform conversions between UTF-8 encoded bytes and 32-bit wchar_t.


There is no notion of utf-8 or unicode in the C++ standard. You should check your available APIs or an external libraries to perform your conversions.

Or you can do yourself the function to check the real number of characters from a utf-8 encoded std::string, I think it's not that difficult if you know how utf-8 works.


Well it is not simple and I have not used it myself but the locale classes should help with converting your string. From the description you can use the ctype::widen method ot convert between a char and a wchar.

0

精彩评论

暂无评论...
验证码 换一张
取 消