Grab xPath content without surrounding markup_问答_开发者

Grab xPath content without surrounding markup

开发者 https://www.devze.com 2023-03-06 09:29 出处：网络

How do you grab the content of xPath without copying the surrounding mark? <div id=\"node-123\" class=\"clearfix\">

相关专题：

How do you grab the content of xPath without copying the surrounding mark?

<div id="node-123" class="clearfix">
                    <div class="content">
                        <div class="body">
                            <p><img src="/images/image.jpg"/></p>
                         开发者_运维知识库   <p>Some content ....</p>
                        </div>    
                    </div>
                </div>

If I used //div[@id='node-123']/div/div, I still get surrounding <div class="body"> which is not expected.

What I want is the content of <div class="body">, excluding this <div class="body"> markup, but reserving other markups inside the content, p, img, etc.

I tried to use wildcard: //div[@id='node-123']/div/div/*, but this only fetch the first p, where p can be two or many. Using node() fetch nothing.

Any hint would be very much appreciated.

Thanks

If I used //div[@id='node-123']/div/div, I still get surrounding <div class="body"> which is not expected.

What I want is the content of <div class="body">, excluding this <div class="body"> markup, but reserving other markups inside the content, p, img, etc.

Use:

//div[@id='node-123']/div/div/node()

This selects all nodes (elements, text-nodes, processing-instructions and comment-nodes) that are children of any div element that is a child of any div element that is a child of any div element in the document such that the value of its id attribute is 'node-123'.

Warning: It is always a good practice not to use the // pseudo-operator if the structure of the XML document is statically known. Using the // pseudo -operator results most-often in very slow performance, causing complete tree traversal.

The problem is unterminated img tag at actual original article: <img src="/images/image.jpg"> rather than <img src="/images/image.jpg"/>.