开发者

How to remove elements of a page in htmlunit

开发者 https://www.devze.com 2023-01-17 05:35 出处:网络
Normally in PHP, I would just parse the开发者_如何学Python old document and write to the new document while ignoring the unwanted elements.This was the first solution I came up with:

Normally in PHP, I would just parse the开发者_如何学Python old document and write to the new document while ignoring the unwanted elements.


This was the first solution I came up with:

            DocumentBuilder builder = DocumentBuilderFactory
                                      .newInstance()
                                      .newDocumentBuilder();

            StringReader reader = new StringReader( xml );
            Document document = builder.parse( new InputSource(reader) );

            XPathExpression expr = XPathFactory
                                   .newInstance()
                                   .newXPath()
                                   .compile( ... );

            Object result = expr.evaluate(document, XPathConstants.NODESET);

            Element el = document.getDocumentElement();
            NodeList nodes = (NodeList) result;
            for (int i = 0; i < nodes.getLength(); i++) {
                el.removeChild( nodes.item(i) );
            }

As you can see it's kinda long. Being a coder who strives for simplicity, I decided to take Ahmed's advice hoping I'll find a better solution and I came up with this:

            List<?> elements = page.getByXPath( ... );

            DomNode node = null;
            for( Object o : elements ) {
                node = (DomNode)o;
                node.getParentNode().removeChild( node );
            }

Please note these are just snippets, I omitted the imports and the XPath expressions but you get the idea.


Have a look at the DOM methods, you can remove nodes.

http://htmlunit.sourceforge.net/apidocs/com/gargoylesoftware/htmlunit/html/DomNode.html

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号