开发者

Extremely slow XSLT transformation in Java

开发者 https://www.devze.com 2023-02-06 22:41 出处:网络
I try to transform XML document using XSLT. As an input I have www.wordpress.org XHTML source code, and XSLT is dummy example retrieving site\'s title (actually it could do nothing - it doesn\'t chang

I try to transform XML document using XSLT. As an input I have www.wordpress.org XHTML source code, and XSLT is dummy example retrieving site's title (actually it could do nothing - it doesn't change anything).

Every single API or library I use, transformation takes about 2 minutes! If you take a look at wordpress.org source, you will notice that it is only 183 lines of code. As I googled it is probably due to DOM tree building. No matter how simple XSLT is, it is always 2 minutes - so it confirms idea that it's related to DOM building, but anyway it should not take 2 minutes in my opinion.

Here is an example code (nothing special):

  TransformerFactory tFactory = TransformerFactory.newInstance();
   Transformer transformer = null;

   try {
       transformer = tFactory.newTransformer(
           new StreamSource("/home/pd/XSLT/transf.xslt"));

   } catch (TransformerConfigurationException e) {
       e.printStackTrace();
   }

   ByteArrayOutputStream outputStream = new ByteArrayOutputStream();

   System.out.println("START");
   try {
       transformer.transform(new SAXSource(new InputSource(
           new FileInputStream("/home/pd/XSLT/wordpress.xml"))),
           new StreamResult(outputStream));
   } catch (TransformerException e) {       
       e.printStackTrace();
   } catch (IOException e) {
       e.printStackTrace();
   }
   System.out.println("STOP");

   System.out.println(new String(outputStream.toByteArray()));

It's between START and STOP where java "pauses" for 2 minutes. If I take a look at the processor or memory usage, nothing开发者_JAVA百科 increases. It looks like really JVM stopped...

Do you have any experience in transforming XMLs that are longer than 50 (this is random number ;)) lines? As I read XSLT always needs to build DOM tree in order to do its work. Fast transformation is crucial for me.

Thanks in advance, Piotr


Does the sample HTML file use namespaces? If so, your XML parser may be attempting to retrieve contents (a schema, perhaps) from the namespace URIs. This is likely if each run takes exactly two minutes -- it's likely one or more TCP timeouts.

You can verify this by timing how long it takes to instantiate your InputSource object (where the WordPress XML is actually parsed), as this is likely the line which is causing the delay. After reviewing the sample file you posted, it does include a declared namespace (xmlns="http://www.w3.org/1999/xhtml").

To work around this, you can implement your own EntityResolver which essentially disables the URL-based resolution. You may need to use a DOM -- see DocumentBuilder's setEntityResolver method.

Here's a sample using DOM and disabling resolution (note -- this is untested):

try {
    DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
    DocumentBuilder db = dbFactory.newDocumentBuilder();
    db.setEntityResolver(new EntityResolver() {

        @Override
        public InputSource resolveEntity(String publicId, String systemId) throws SAXException, IOException {
            return null; // Never resolve any IDs
        }
    });

    System.out.println("BUILDING DOM");

    Document doc = db.parse(new FileInputStream("/home/pd/XSLT/wordpress.xml"));

    ByteArrayOutputStream outputStream = new ByteArrayOutputStream();

    TransformerFactory tFactory = TransformerFactory.newInstance();
    Transformer transformer = tFactory.newTransformer(
        new StreamSource("/home/pd/XSLT/transf.xslt"));

    System.out.println("RUNNING TRANSFORM");

    transformer.transform(
            new DOMSource(doc.getDocumentElement()),
            new StreamResult(outputStream));

    System.out.println("TRANSFORMED CONTENTS BELOW");
    System.out.println(outputStream.toString());
} catch (Exception e) {
    e.printStackTrace();
}

If you want to use SAX, you would have to use a SAXSource with an XMLReader which uses your custom resolver.


The commenters who've posted that the answer likely resides with the EntityResolver are probably correct. However, the solution may not be to simply not load the schemas but rather load them from the local file system.

So you could do something like this

  db.setEntityResolver(new EntityResolver() {

    @Override
    public InputSource resolveEntity(String publicId, String systemId) throws SAXException, IOException {
        try {
        FileInputStream fis = new FileInputStream(new File("classpath:xsd/" + systemId));
        InputSource is  = new InputSource(fis);
        return is
    } catch (FileNotFoundException ex) {
        logger.error("File Not found", ex);
        return null;
    }
    }
});


Chances are the problem isn't with the call transfomer.transform. It's more likely that you are doing something in your xslt that is taking forever. My suggestion would be use a tool like Oxygen or XML Spy to profile your XSLT and find out which templates are taking the longest to execute. Once you've determined this you can begin to optimize the template.


If you are debugging your code on an android device, make sure you try it without eclipse attached to the process. When I was debugging my app xslt transformations were taking 8 seconds, where the same process took a tenth of a second on ios in native code. Once I ran the code without eclipse attached to it, the process took a comparable amount of time to the c based counterpart.

0

精彩评论

暂无评论...
验证码 换一张
取 消