开发者

Turn off dtd validation in XPathExpression evaluate()

开发者 https://www.devze.com 2023-02-12 09:08 出处:网络
I want a small subtree out of a xml file (100 Mb) and need to turn off DTD validation, but I can not find any solution for that.

I want a small subtree out of a xml file (100 Mb) and need to turn off DTD validation, but I can not find any solution for that.

XPath xpath = XPathFactory.newInstance().newXPath();  
XPathExpression expr = xpath.compile("//HEADER");  
Node node = (Node) expr.evaluate(new InputSource(new FileReader(file)), XPathConstants.NODE);

I tryed to use DocumentBuilder and turn off the DTD validat开发者_运维问答ion but that's so slow.

Thanks,

Joo


The reason why it's so slow is because you are forcing a full scan of all the nodes because your XPath criterion is too vague: //HEADER means that the XPath engine will scan each and every node of your 100MB to select the ones where the node name is HEADER. If you can make the XPath expression more specific, you should see dramatic improvements.

Other than that, the code below is something I had to do to prevent DTD validation in the past. It forces Xerces as the SAX parser and explicitly sets a number of Xerces specific features. But again this will probably not affect significantly the response time.

import java.io.File;
import java.io.StringReader;

import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;

import org.apache.xerces.jaxp.SAXParserFactoryImpl;
import org.xml.sax.InputSource;

[...]

    private static SAXParserFactory spf ;

    private static SAXParserFactory spf ;

    private BillCooker() throws Exception {

        System.setProperty("javax.xml.parsers.SAXParserFactory", "org.apache.xerces.jaxp.SAXParserFactoryImpl" ) ;

        spf = SAXParserFactoryImpl.newInstance();
        spf.setNamespaceAware(true);
        spf.setValidating(false);
        spf.setFeature("http://xml.org/sax/features/validation", false);
        spf.setFeature("http://apache.org/xml/features/nonvalidating/load-dtd-grammar", false);
        spf.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);

I trimmed it to leave only the lines relevant to validation

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号