开发者

Problem with special characters

开发者 https://www.devze.com 2023-01-08 14:52 出处:网络
I am troubled by this typical problem with special characters. We have an mbean running in production tomcat server (installed on Linux) which picks up xml feeds and sends for further processing. The

I am troubled by this typical problem with special characters.

We have an mbean running in production tomcat server (installed on Linux) which picks up xml feeds and sends for further processing. The problem crops up when the mbean has to process special characters which are replaced by '??' marks. The same code is available in the local dev and QA servers which works fine though the OS version, the tomcat version are all same. The part of code which reads the xml feed and send to a JMS Q is pasted below:

StringBuffer article = new StringBuffer();

InputStreamReader is = new InputStreamReader(new FileInputStream(pendingFile), "utf-8");
int data;
while ((data = is.read()) != -1) {
    article.append((char)data);
}
is.close();
is = null;

log.debug("Read in \n" + article.toString());
try {
    js.writeTextMessage(article.toString(), "server", hostName, processor);
} catch (JMSException je) {
    log.error("jms exception: " + je.getMessage());
    // server probably shutdown
    this.stop();
    return;
}

The above code reads the files from "pending file" , appends it to Stringbuffer, reads the file to a log and posts to JMS queue. The log file displays the special charas as ?? 'Only in Prod' The Xml feed with开发者_运维问答 special characters is as below:

<?xml version="1.0" encoding="UTF-8"?>
<hedline>
    <hl1>
        Hotelliyöpymiset: Missä hinta ja palvelu vastaavat toisiaan (tai eivät) - asiakastyytyväisyyden huippukaupungit
    </hl1>
</hedline>* 

We tried all the possibilites which include:

  1. URI encoding to utf-8 in server.xml for tomcat.
  2. verified the LANG environment variable is en_US.UTF-8 on linux.
  3. verified that the xml file has default encoding as UTF8 without BOM.

We are unable to find whether the cause is with Tomcat server or Linux OS. Please help.


Don't log the article string just as text. Dump each character out as a hex integer. That way you can tell whether it's the logging which is failing, or the reading which is failing.

It's not clear to me what the JMS queue's behaviour is - is it only the logging which is failing, or the JMS as well?


When you are logging via Log4j for example with a FileAppender, you can set the encoding of the logfile:

<appender name="SOME_LOG" class="org.apache.log4j.RollingFileAppender">
<param name="Encoding" value="UTF-8" />

Additionally, there must an appropriate charset installed for displaying the chars correctly.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号