I can开发者_如何学Gonot make TagSoup work. I'm using the code that follows, but when I print the Node returned by the parser (the line with System.err.println(doc);) , I always get "[#document: null]".
I don't know how to find the bug in this code or, whichever it is, the origin of the problem. Please help!
public final Document parseDOM(final File fileToParse) {
Parser p = new Parser();
SAX2DOM sax2dom = null;
org.w3c.dom.Node doc = null;
try {
URL url = new URL("http://stackoverflow.com/");
p.setFeature(Parser.namespacesFeature, false);
p.setFeature(Parser.namespacePrefixesFeature, false);
sax2dom = new SAX2DOM();
p.setContentHandler(sax2dom);
p.parse(new InputSource(new InputStreamReader(url.openStream())));
doc = sax2dom.getDOM();
System.err.println(doc);
} catch (Exception e) {
// TODO handle exception
e.printStackTrace();
}
return doc.getOwnerDocument();
}
From the documentation on getOwnerDocument
:
When this node is a Document or a DocumentType which is not used with any Document yet, this is null.
Since getDOM
in your case should return a Document
, you could simply cast the return value or change the type of doc
to Document
.
Your parser is working, but you just can't print out a node like that. The easiest way to print out a node and all its children is to use an XML Serializer like this:
Writer out = new StringWriter();
XMLSerializer serializer = new XMLSerializer(out, new OutputFormat());
serializer.serialize(doc);
System.out.println(out.toString());
精彩评论