开发者

Is there an STL string class that properly handles Unicode?

开发者 https://www.devze.com 2023-02-07 17:19 出处:网络
I know all about std::string and std::wstring but they don\'t seem to fully pay attention to exte开发者_如何学Cnded character encoding of UTF-8 and UTF-16 (On windows at least).There is also no suppor

I know all about std::string and std::wstring but they don't seem to fully pay attention to exte开发者_如何学Cnded character encoding of UTF-8 and UTF-16 (On windows at least). There is also no support for UTF-32.

So does anyone know of cross-platform drop-in replacement classes that provide full UTF-8, UTF-16 and UTF-32 support?


And let's not forget the lightweight, very user-friendly, header-only UTF-8 library UTF8-CPP. Not a drop-in replacement, but can easily be used in conjunction with std::string and has no external dependencies.


Well in C++0x there are classes std::u32string and std::u16string. GCC already partially supports them, so you can already use them, but streams support for unicode is not yet done Unicode support in C++0x.


It's not STL, but if you want proper Unicode in C++, then you should take a look at ICU.


There is no support of UTF-8 on the STL. As an alternative youo can use boost codecvt:

//...
// My encoding type
typedef wchar_t ucs4_t;

std::locale old_locale;
std::locale utf8_locale(old_locale,new utf8_codecvt_facet<ucs4_t>);

// Set a New global locale
std::locale::global(utf8_locale);

// Send the UCS-4 data out, converting to UTF-8
{
    std::wstringstream oss;
    oss.imbue(utf8_locale);
    std::copy(ucs4_data.begin(),ucs4_data.end(),
        std::ostream_iterator<ucs4_t,ucs4_t>(oss));

    std::wcout << oss.str() << std::endl;
}


For UTF-8 support, there is the Glib::ustring class. It is modeled after std::string but is utf-8 aware,e.g. when you are scanning the string with an iterator. It also has some restrictions, e.g. the iterator is always const, as replacing a character can change the length of the string and so it can invalidate other iterators.

ustring does not automatically converts other encodings to utf-8, Glib library has various conversion functions for this. You can validate whether the string is a valid utf-8 though.

And also, ustring and std::string are interchangeable, i.e. ustring has a cast operator to std::string so you can pass a ustring as a parameter where an std::string is expected, and vice versa of course, as ustring can be constructed from std::string.


Qt has QString which uses UTF-16 internally, but has methods for converting to or from std::wstring, UTF-8, Latin1 or locale encoding. There is also the QTextCodec class which can convert QStrings to or from basically anything. But using Qt for just strings seems like an overkill to me.


Also look at http://grigory.info/UTF8Strings.About.html it is UTF8 native.

0

精彩评论

暂无评论...
验证码 换一张
取 消