开发者

How to convert non-supported character to html entity in Java

开发者 https://www.devze.com 2022-12-12 01:53 出处:网络
Some character not support by certain charset, so below test fail. I would like to use html entity to encode ONLY those not supported character. How, in java?

Some character not support by certain charset, so below test fail. I would like to use html entity to encode ONLY those not supported character. How, in java?

public void testWriter() throws IOException{
    String c = "\u00A9";
    String encoding = "gb2312";
    ByteArrayOutputStream outStream = new ByteArrayOutputStream();
    Writer writer  = new BufferedWriter(new OutputStrea开发者_如何转开发mWriter(outStream, encoding));
    writer.write(c);
    writer.close();
    String result = new String(outStream.toByteArray(), encoding);
    assertEquals(c, result);
}


I'm not positive I understand the question, but something like this might help:

import java.nio.charset.CharsetEncoder;

...

  StringBuilder buf = new StringBuilder(c.length());
  CharsetEncoder enc = Charset.forName("gb2312");
  for (int idx = 0; idx < c.length(); ++idx) {
    char ch = c.charAt(idx);
    if (enc.canEncode(ch))
      buf.append(ch);
    else {
      buf.append("&#");
      buf.append((int) ch);
      buf.append(';');
    }
  }
  String result = buf.toString();

This code is not robust, because it doesn't handle characters beyond the Basic Multilingual Plane. But iterating over code points in the String, and using the canEncode(CharSequence) method of the CharsetEncoder, you should be able to handle any character.


Try using StringEscapeUtils from apache commons.


Just use utf-8, and that way there is no reason to use entities. If there is an argument that some clients need gb2312 because they don't understand Unicode, then entities are not much use either, because the numeric entities represent Unicode code points.

0

精彩评论

暂无评论...
验证码 换一张
取 消