开发者

Java 1.6: javax.xml.transform.Transformer refuses to indent xml strings which contain newlines

开发者 https://www.devze.com 2023-03-26 08:48 出处:网络
I need to be able to pretty print xml strings using Java APIs and have found multiple solutions for this both on the web and on this particular website. However despite multiple attempts to get this t

I need to be able to pretty print xml strings using Java APIs and have found multiple solutions for this both on the web and on this particular website. However despite multiple attempts to get this to work with javax.xml.transform.Transformer it's been a failure so far. T开发者_JAVA百科he code I provide below works only partially when the xml string in the argument does not contain any newlines between xml elements. This just wont do. I need to be able to pretty print anything, assuming it is well formed and valid xml, even previously pretty printed strings.

I got this (put together from code snippets I found, people claimed it worked for them):

import java.io.*;
import javax.xml.transform.*;
import javax.xml.transform.stream.*;

public class XMLFormatter {

    public static String format(String xml, int indent, boolean omitXmlDeclaration)
            throws TransformerException {

        if (indent < 0) {
            throw new IllegalArgumentException();
        }
        String ret = null;
        StringReader reader = new StringReader(xml);
        StringWriter writer = new StringWriter();
        try {
            TransformerFactory factory = TransformerFactory.newInstance();
            factory.setAttribute("indent-number", new Integer(indent));
            Transformer transformer = factory.newTransformer();
            if (omitXmlDeclaration) {
                transformer.setOutputProperty(
                        OutputKeys.OMIT_XML_DECLARATION, "yes");
            }
            transformer.setOutputProperty(OutputKeys.INDENT, "yes");
            transformer.setOutputProperty(
                    "{http://xml.apache.org/xslt}indent-amount",
                    String.valueOf(indent));
            transformer.setOutputProperty(OutputKeys.METHOD, "xml");
            transformer.transform(
                    new StreamSource(reader),
                    new StreamResult(writer));
            ret = writer.toString();
        } catch (TransformerException ex) {
            throw ex;
        } finally {
            if (reader != null) {
                reader.close();
            }
            try {
                if (writer != null) {
                    writer.close();
                }
            } catch (IOException ex) {}
        }

        return ret;
    }

    public static void main(String[] args) throws TransformerException {
        StringBuilder sb = new StringBuilder();
        sb.append("<rpc-reply><data><smth/></data></rpc-reply>");

        System.out.println(sb.toString());
        System.out.println();
        System.out.println(XMLFormatter.format(sb.toString(), 4, false));

        final String NEWLINE = System.getProperty("line.separator");
        sb.setLength(0);
        sb.append("<rpc-reply>");sb.append(NEWLINE);
        sb.append("<data>");sb.append(NEWLINE);
        sb.append("<smth/>");sb.append(NEWLINE);
        sb.append("</data>");sb.append(NEWLINE);
        sb.append("</rpc-reply>");

        System.out.println(sb.toString());
        System.out.println();
        System.out.println(XMLFormatter.format(sb.toString(), 4, false));
    }
}

This code should not be bothered by those newlines, should it? Is this a bug or am I missing something vital here? The output for the code snippet:

<rpc-reply><data><smth/></data></rpc-reply>

<?xml version="1.0" encoding="UTF-8"?>
<rpc-reply>
    <data>
        <smth/>
    </data>
</rpc-reply>

<rpc-reply>
<data>
<smth/>
</data>
</rpc-reply>

<?xml version="1.0" encoding="UTF-8"?>
<rpc-reply>
<data>
<smth/>
</data>
</rpc-reply>

As far as I can tell my code only differs from other examples in that I use StringWriter and StringReader for the transform(in, out) method. I've already tried converting the xml to a ByteArrayOutputStream and even parsing it with DOM and then feeding it to transformer but the result is the same. I would really appreciate to know why this only works for single line strings.

I'm using jdk1.6_u24 combined with Netbeans 6.9.1.

This question is related to (and probably to a multitude of others) but not the same as:

How to pretty print XML from Java?

indent XML text with Transformer

Indent XML made with Transformer


I've concluded that this is normal behavior for Transformer. Even more. It's indent functionality is not meant to be used as a pretty printer, not on it's own anyways. When XML is pretty printed it's structure changes unless you know exactly what the document should look like (based on it's XSD, DTD or something similar). That is the only way to determine which newline characters are to be considered ignorable whitespace and which are actual element values or a part of them. Transformer does not reformat existing whitespace and that's why the output of my code is what it is.

So if you want to pretty print an already pretty printed XML string using Transformer or any other class, you first have to get rid of ignorable whitespace and the only way to safely do that is to know what the structure of your XML document should be like. I'd like someone to confirm this statement for me since this is currently only my assumption. If this statement is correct; how do third party pretty printers do it? I know JTidy did not require an XSD, but pretty printed anyway. Does it simply treat all whitespace as ignorable whitespace unless it is enclosed in a text XML node? Are there other methods of determining and eliminating ignorable whitespace?

0

精彩评论

暂无评论...
验证码 换一张
取 消