Building a DOM Document with tagsoup_问答_开发者

开发者 https://www.devze.com 2023-01-23 08:23 出处：网络

I can开发者_如何学Gonot make TagSoup work. I\'m using the code that follows, but when I print the Node returned by the parser (the line with System.err.println(doc);) , I always get \"[#document: null

相关专题：dom parsing

I can开发者_如何学Gonot make TagSoup work. I'm using the code that follows, but when I print the Node returned by the parser (the line with System.err.println(doc);) , I always get "[#document: null]".

I don't know how to find the bug in this code or, whichever it is, the origin of the problem. Please help!

public final Document parseDOM(final File fileToParse) {
  Parser p = new Parser();
  SAX2DOM sax2dom = null;
  org.w3c.dom.Node doc  = null;

  try { 

        URL url = new URL("http://stackoverflow.com/");
        p.setFeature(Parser.namespacesFeature, false);
        p.setFeature(Parser.namespacePrefixesFeature, false);
        sax2dom = new SAX2DOM();
        p.setContentHandler(sax2dom);
        p.parse(new InputSource(new InputStreamReader(url.openStream())));
        doc = sax2dom.getDOM();
        System.err.println(doc);
  } catch (Exception e) {
     // TODO handle exception
     e.printStackTrace();
  }


  return doc.getOwnerDocument();
 }

From the documentation on getOwnerDocument:

When this node is a Document or a DocumentType which is not used with any Document yet, this is null.

Since getDOM in your case should return a Document, you could simply cast the return value or change the type of doc to Document.

Your parser is working, but you just can't print out a node like that. The easiest way to print out a node and all its children is to use an XML Serializer like this:

          Writer out = new StringWriter();
          XMLSerializer serializer = new XMLSerializer(out, new OutputFormat());
          serializer.serialize(doc);
          System.out.println(out.toString());