I am accessing an ICU4C function through JNI which returns a UChar * (i.e. unicode character array).... I was able to conv开发者_C百科ert that to jbyteArray by equating each member of the UChar array to a local jbyte[] array that I created and then I returned it to Java using the env->SetByteArrayRegion() function... now I have the Byte[] array in Java but it's all gibberish pretty much.. Weird symbols at best... I am not sure where the problem might be... I am working with unicode characters if that matters... how do I convert the byte[] to a char[] in java properly? Something is not being mapped right... Here is a snippet of the code:
--- JNI code (altered slighter to make it shorter) ---
static jint testFunction(JNIEnv* env, jclass c, jcharArray srcArray, jbyteArray destArray) {
jchar* src = env->GetCharArrayElements(srcArray, NULL);
int n = env->getArrayLength(srcArray);
UChar *testStr = new UChar[n];
jbyte destChr[n];
//calling ICU4C function here
icu_function (src, testStr); //takes source characters and returns UChar*
for (int i=0; i<n; i++)
destChr[i] = testStr[i]; //is this correct?
delete testStr;
env->SetByteArrayRegion(destArray, 0, n, destChr);
env->ReleaseCharArrayElements(srcArray, src, JNI_ABORT);
return (n); //anything for now
}
-- Java code -- string wohoo = "ABCD bal bla bla"; char[] myChars = wohoo.toCharArray();
byte[] myICUBytes = new byte[myChars.length];
int value = MyClass.testFunction (myChars, myICUBytes);
System.out.println(new String(myICUBytes)) ;// produces gibberish & weird symbols
I also tried: System.out.println(new String(myICUBytes, Charset.forName("UTF-16"))) and it's just as gebberishy....
note that the ICU function does return the proper unicode characters in the UChar *... somewheres between the conversion to jbyteArray and Java that is is messing up...
Help!
destChr[i] = testStr[i]; //is this correct?
This looks like an issue all right.
JNI types:
byte jbyte signed 8 bits
char jchar unsigned 16 bits
ICU4C types:
Define UChar to be wchar_t if that is 16 bits wide; always assumed to be unsigned.
If wchar_t is not 16 bits wide, then define UChar to be uint16_t or char16_t because GCC >=4.4 can handle UTF16 string literals. This makes the definition of UChar platform-dependent but allows direct string type compatibility with platforms with 16-bit wchar_t types.
So, aside from anything icu_function
might be doing, you are trying to fit a 16-bit value into an 8-bit-wide type.
If you must use a Java byte array, I suggest converting to the 8-bit char
type by transcoding to a Unicode encoding.
To paraphrase some C code:
UChar *utf16 = (UChar*) malloc(len16 * sizeof(UChar));
//TODO: fill data
// convert to UTF-8
UConverter *encoding = ucnv_open("UTF-8", &status);
int len8 = ucnv_fromUChars(encoding, NULL, 0, utf16, len16, &status);
char *utf8 = (char*) malloc(len8 * sizeof(char));
ucnv_fromUChars(encoding, utf8, len8, utf16, len16, &status);
ucnv_close(encoding);
//TODO: char to jbyte
You can then transcode this to a Java String using new String(myICUBytes, "UTF-8")
.
I used UTF-8 because it was already in my sample code and you don't have to worry about endianness. Convert my C to C++ as appropriate.
Have you considered using ICU4J?
Also, when converting your bytes to a string, you will need to specify a character encoding. I'm not familiar with the library in question, so I can't advise you further, but perhaps this will be "UTF-16" or similar?
Oh, and it's also worth noting that you might simply be getting display errors because the terminal you're printing to isn't using the correct character set and/or doesn't have the right glyphs available.
精彩评论