开发者

Java encoding for Japanese characters

开发者 https://www.devze.com 2023-02-14 07:27 出处:网络
I have a file name with Japanese characters. file name: S-最終条件.pdf. In Java, file name: S-最終条件.pdf.

I have a file name with Japanese characters. file name: S-最終条件.pdf. In Java, file name: S-最終条件.pdf.

// Support for Japanese file name
fileNameX = new String(fileName.getBytes("Shift_JIS"),"ISO8859_1");

The output fileNameX is coming out S?最終条件.pdf. Hence it is throwing an error. I am trying to outstream the file in PDF format, but the particular Japanese character "-" is not recognised and it is throwing error while streaming.

Plea开发者_StackOverflowse help me solve this issue.

Thanks, Prasanna


Let's see what your code actually does:

//Assign to bytes the UTF-16 String fileName Encoded in Shift_JIS
//bytes now contains the binary Shift_JIS representation of your String
final byte[] bytes = fileName.getBytes("Shift_JIS");

//Create a new String UTF-16 by interpreting bytes as ISO8859_1
//Takes the Shift_JIS encoded bytes and interprets it as ISO8859_1
new String(bytes,"ISO8859_1");

Java strings use UTF-16 for their internal representation. You cannot specify a target encoding when you create a string as UTF-16 is fixed, you have to Specify the correct source encoding which is "Shift_JIS" for the bytes array.

The fileNameX should come out correct without converting.


This is the mapping problem both Shift_JIS code and Unicode. Shift_JIS doesn't have all the characters of Unicode so some characters become "?".

Following is the result of conversion from Unicode to Shift_JIS.

RESULT  UNICODE
[NG]    U+2012 (FIGURE DASH)
[NG]    U+2013 (EN DASH)
<OK>    U+2014 (EM DASH)
[NG]    U+2015 (HORIZONTAL BAR)
<OK>    U+2212 (MINUS SIGN)
[NG]    U+FF0D (FULLWIDTH HYPHEN-MINUS)

One solution is a replacement of the code.

U+2012,U+2013,U+2015 --> U+2014
U+FF0D               --> U+2212


The Answers by @josefx and @Yu Sun corn are both collect.

First, as @josefx answered, when you want the Shift JIS representation of a string and reverse it to a String object, you have to pass the same encoding to String#getBytes(String charsetName) and the constructor String(byte[] bytes, String charsetName).

Second, you have to use Windows-31J instead of Shift_JIS as the encoding name. The encoding scheme of Windows-31J and Shift_JIS are the same, but the character set is slightly different: Windows-31J has some additional characters (Note that Windows-31J in Windows document is called "Shift JIS". So in most cases, you should use Windows-31J when you want to use Shift JIS). As @Yu Sun corn answered, the string "S-最終条件.pdf" contains a character that is not contained in the character set of Shift JIS: . The character set of Windows-31J contains this character.

Finally, the code you should use will be like this:

// Get the byte-stream representation of Japanese characters in Windows-31J encoding.
// Windows-31J (aka MS932) is the default encoding when you run Java VM in Windows with Japanese locale.
byte [] textBytes = name.getBytes("Windows-31J");

// Reverse byte-stream representation to a String object
System.out.println(new String(textBytes, "Windows-31J"));
0

精彩评论

暂无评论...
验证码 换一张
取 消