How to address this encoding issue in java_问答_开发者

I am having this encoding issue in java, one string I actually need to handle is the response from running "systeminfo" command under windows commandline, and I need to present the result in a html document. The problem is if I run my application on French operating system, the garbled characters are shown in the html, no matter how I tried to convert the encodeing settings.

From the log, I can see the system encoding is "Cp1252", code snippet is as follows:

String systemEncoding = System.getProperty("sun.jnu.encoding");
log.info("sun.jnu.encoding="+systemEncoding);

In html builder class, I did something like this:

for(String line : lines){
    line = new String(line.getBytes("Cp1252"), "UTF8");
    osReport.append(line + "<br>");
}

Unfortunately, I still can see those garbled "question marks" all around, which are supposed to be some French characters.. The html header looks like this btw

<HEAD>
<META content="text/html; charset=UTF-8" http-equiv=Content-Type>
</HEAD>

How to get the response string, see the following piece of code please..

try{
    String systemEncoding = System.getProperty("sun.jnu.encoding");
    log.info("sun.jnu.encoding="+systemEncoding);
    InputStreamReader isr;
    if (StringUtil.isEmpty(systemEncoding)) {
        isr = new InputStreamReader(is);
    } else {
        isr = new InputStreamReader(is, systemEncoding)开发者_JS百科;
    }
    BufferedReader br = new BufferedReader(isr);
    String line=null;
    while ((line = br.readLine()) != null) {
        res.append(line);
        res.append(LINE_SEP);
    }   
 } catch (IOException ioe) {
    log.error("IOException occurred while printing the response",ioe);
 }

Any help?? Thanks so very much!

I am assuming you are invoking the command via the Process type. I would expect systeminfo.exe to write output using the default ANSI encoding (windows-1252 on a French system.)

That means that you can use the default encoding to read the input (the one used by the InputStreamReader(InputStream) constructor.) This will transcode the input from the default encoding to UTF-16. This code uses the Scanner type with the default system encoding:

Process process = new ProcessBuilder(command).redirectErrorStream(true)
    .start();
InputStream in = process.getInputStream();
try {
  Scanner scanner = new Scanner(in);
  while (scanner.hasNextLine()) {
    lines.add(scanner.nextLine());
  }
  if (process.exitValue() != 0 || scanner.ioException() != null) {
    // throw exceptions
  }
} finally {
  in.close();
}

Java strings are always UTF-16, so code like this is just a transcoding bug:

new String(line.getBytes("Cp1252"), "UTF8");

Ensure that you are encoding your HTML file correctly.

Charset utf8 = Charset.forName("UTF-8");
OutputStream out = new FileOutputStream(file);
Closeable stream = out;
try {
  Writer writer = new OutputStreamWriter(out, utf8);
  stream = writer;
  // write to writer here
} finally {
  stream.close();
}

I would not try to read or directly change system properties like sun.jnu.encoding or file.encoding - these are JVM implementation details and their direct use or configuration is not supported.

If you are relying on System.out to verify characters, ensure the device consuming the output decodes its input as windows-1252. See here for more on encoding.

Without defining the used character encoding, you can't display those French characters in html using the plain character code point. In other words, this doesn't work:

<html>
<body>
accent égu et ce çedille :D
</body>
</html>

This results in:

accent Ã©gu et ce Ã§edille :D

So, you have to define the encoding in the meta headers OR replace all the French characters by their escape equivalent. Full list here.

And about the trick with the system character encoding: I don't think that what the sun.jnu.encoding says, is the same encoding that systeminfo.exe uses to output.