I want to store a Japanese text in a string and write it to a file. I am totally unfamiliar with encoding and there are a lot of data types like wchar_t and wstring in C++ which appear confusing to m开发者_C百科e. How can I do this?
I am trying to create a well-formed XML file with some CDATA content being Japanese.
Ignore the complexities and pitfalls of wide strings altogether; and ensure that the data you are dealing with is encoded using UTF-8 instead.
In C++, UTF-8-strings can be handled just like extended ASCII strings; unless you happen to actually manipulating them (chopping them up, counting characters, things like that).
If all you care about is gathering, storing and displaying the strings, it is quite simply laughably trivial.
(Without more information about the environment in which you are working, it's impossible to tell you exactly how you would go about ensuring UTF-8-ness; but that's really beyond the scope of this question.)
Edit:
In response to comments regarding what you are planning to do (writing an XML file):
When working with XML in particular; it's very, very simple:
Never Don't Use UTF-8!, or "N'DUUH!" for short.
In XML, the ASCII-balance will in practice always be such that UTF-8 is the most space-efficient encoding system.
(To wit, if each Japanese character in the document can be matched by an ASCII character, UTF-8 is exactly as efficient as UTF-16, in terms of space. XML element names are traditionally needlessly verbose, and Japanese sentences are notoriously compact; and when adding in indentation, Japanese text will almost always be matched by ASCII in abundance.)
wchar_t
and std::wstring
can store unicode text, so it's safe to manage and write them to a file.
Be advised that sizeof(wchar_t)==2, and sizeof(char)==1
::WriteFile(m_hFile, strString.c_str(), strString.length()*sizeof(wchar_t), pdwWritten, NULL)
I am trying to create a well-formed XML file with some CDATA content being Japanese.
That's not necessarily a good idea. The xml:lang
attribute is generally how you tell what language a piece of text contained in XML is in, and you can't apply attributes to CDATA sections. So these should be in some kind of XML element that can have a proper xml:lang
attribute on it.
In any case, you need to pick an encoding. The entire file must have the same encoding. So make sure to specify your encoding in the XML header. Please don't make XML parsers guess your encoding.
If you're used to writing bytes, I'd suggest UTF-8, as you sidestep a lot of endian issues that you might encounter on other platforms. Each UTF-8 code unit is a char
, so you can use std::string
to hold these (though you will have to process them carefully).
精彩评论