Normally in PHP, I would just parse the开发者_如何学Python old document and write to the new document while ignoring the unwanted elements.
This was the first solution I came up with:
DocumentBuilder builder = DocumentBuilderFactory
.newInstance()
.newDocumentBuilder();
StringReader reader = new StringReader( xml );
Document document = builder.parse( new InputSource(reader) );
XPathExpression expr = XPathFactory
.newInstance()
.newXPath()
.compile( ... );
Object result = expr.evaluate(document, XPathConstants.NODESET);
Element el = document.getDocumentElement();
NodeList nodes = (NodeList) result;
for (int i = 0; i < nodes.getLength(); i++) {
el.removeChild( nodes.item(i) );
}
As you can see it's kinda long. Being a coder who strives for simplicity, I decided to take Ahmed's advice hoping I'll find a better solution and I came up with this:
List<?> elements = page.getByXPath( ... );
DomNode node = null;
for( Object o : elements ) {
node = (DomNode)o;
node.getParentNode().removeChild( node );
}
Please note these are just snippets, I omitted the imports and the XPath expressions but you get the idea.
Have a look at the DOM methods, you can remove nodes.
http://htmlunit.sourceforge.net/apidocs/com/gargoylesoftware/htmlunit/html/DomNode.html
精彩评论