开发者

Are BSTR UTF-16 Encoded?

开发者 https://www.devze.com 2023-01-22 09:20 出处:网络
I\'m in the process of trying to learn Unicode?For me the most difficult part is the Encoding.Can BSTRs (Basic String)开发者_如何学C content code points U+10000 or higher?If no, then what\'s the encod

I'm in the process of trying to learn Unicode? For me the most difficult part is the Encoding. Can BSTRs (Basic String)开发者_如何学C content code points U+10000 or higher? If no, then what's the encoding for BSTRs?


In Microsoft-speak, Unicode is generally synonymous with UTF-16 (little endian if memory serves). In the case of BSTR, the answer seems to be it depends:

  • On Microsoft Windows, consists of a string of Unicode characters (wide or double-byte characters).
  • On Apple Power Macintosh, consists of a single-byte string.
  • May contain multiple embedded null characters.

So, on Windows, yes, it can contain characters outside the basic multilingual plane but these will require two 'wide' chars to store.


BSTR's on Windows originally contained UCS-2, but can in principle contain the entire unicode set, using surrogate pairs. UTF-16 support is actually up to the API that receives the string - the BSTR has no say how it gets treated. Most API's support UTF-16 by now. (Michael Kaplan sorts out the details.)

The windows headers still contain another definition for BSTR, it's basically

#if defined(_WIN32) && !defined(OLE2ANSI)
   typedef wchar_t OLECHAR;
#else
   typedef char OLECHAR;
#endif
typedef OLECHAR * BSTR;

There's no real reason to consider the char, however, unless you desperately want to be compatible with whatever this was for. (IIRC it was active - or could be activated - for early MFC builds, and might even have been used in Office for Mac or something like that.)

0

精彩评论

暂无评论...
验证码 换一张
取 消