开发者

String conversion from UTF-8 to UTF-16 Big endian is failing (using C, C++ language)

开发者 https://www.devze.com 2023-01-24 20:51 出处:网络
I am using g_convert() glib function to convert utf-8 string to utf-16 big endian string. The conversion is failing.We are getting an error saying \"conversion is not supported\"

I am using g_convert() glib function to convert utf-8 string to utf-16 big endian string. The conversion is failing. We are getting an error saying "conversion is not supported"

Could someone give a clue to overcome this issue.

Thanks

Following is the piece of code used to convert string from UTF-8. to UTF16 Bigendian

unsigned short *result_开发者_StackOverflow中文版str;

gsize bytes_read, bytes_written;

gssize len = 0;

GError *error = NULL;

result_str = (unsigned short *)g_convert("text data", len, "UTF-16BE", "UTF-8", &bytes_read, &bytes_written, &error);


You len is 0. The GLib manual says that len must be -1 for a NULL-terminated string.


g_convert uses iconv underneath the covers.

On my machine using cygwim I can do

iconv -l 

which lists the supported encodings and UTF-16BE does appear in the list however:-

$ iconv -l | grep BE
UCS-2BE UNICODE-1-1 UNICODEBIG CSUNICODE11
UCS-4BE
UTF-16BE
UTF-32BE

James@XPL3KWK28 ~
$ iconv -f UTF-8 -t UTF16-BE
iconv: conversion to UTF16-BE unsupported
iconv: try 'iconv -l' to get the list of supported encodings

as you can see it does not support the conversion to or from UTF-8.

You probably need to do this in two stages UTF-8 to UTF-16 then UTF-16 to UTF-16BE.


I suspect UTF-16BE is not supported by g_convert (based on the error message). It's trivial to convert UTF-8 into UTF-16BE though (no tables or other garbage like that) -- you can do that transformation yourself.

You might also want to check if UTF-16 is supported and do your own byte swapping if necessary. But I do not believe g_convert supports UTF-16 either.


Looks like your system does not support that conversion. (This error means that iconv() returned EINVAL.)

On my Linux system it does appear to be supported:

echo "Hello" | iconv --from-code UTF-16BE --to-code UTF-8

(obviously "Hello" is not a valid UTF-16 string, but it does get converted to something, so the actual conversion seems to be supported)

See if you have UTF-16BE in "iconv --list"

In this particular case your simplest solution might be to just use g_utf8_to_utf16(): http://library.gnome.org/devel/glib/stable/glib-Unicode-Manipulation.html#g-utf8-to-utf16

You can easily do your own byteswap, untested code:

if (G_BYTE_ORDER != G_BIG_ENDIAN) {
  for (i = 0; i < len; ++i) {
    result_str[i] = GUINT16_TO_BE(result_str[i]);
  }
}
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号