开发者

Convert HTML symbols and HTML names to HTML number using Java

开发者 https://www.devze.com 2023-03-07 12:31 出处:网络
I have an XML which contains many special symbols like ® (HTML number &#174) etc. and HTML names like &atilde (HTML number ã) etc.

I have an XML which contains many special symbols like ® (HTML number &#174) etc. and HTML names like &atilde (HTML number ã) etc.

I am trying to replace these HTML symbols and HTML names with corresponding HTML number using Java. For this, I first converted XML file to string and then used repla开发者_如何学JAVAceAll method as:

File fn = new File("myxmlfile.xml");
String content = FileUtils.readFileToString(fn);
content = content.replaceAll("®", "&\#174");
FileUtils.writeStringToFile(fn, content);

But this is not working.

Can anyone please tell how to do it.

Thanks !!!


The signature for the replaceAll method is:

public String replaceAll(String regex, String replacement)

You have to be careful that your first parameter is a valid regular expression. The Java Pattern class describes the constructs used in a Java regular expression.

Based on what I see in the Pattern class description, I don't see what's wrong with:

content = content.replaceAll("®", "&\#174");

You could try:

content = content.replaceAll("\\p(®)", "&\#174");

and see if that works better.


I don't think that \# is a valid escape sequence. BTW, what's wrong with "&#174" ?


If you want HTML numbers try first escaping for XML.

Use EscapeUtils from Apache Commons Lang.

Java may have trouble dealing with it, so first I prefere to escape Java, and after that XML or HTML.

    String escapedStr= StringEscapeUtils.escapeJava(yourString);
    escapedStr= StringEscapeUtils.escapeXML(yourString);
    escapedStr= StringEscapeUtils.escapeHTML(yourString);
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号