开发者

What is the Xpath expression that involves multiple exclusions?

开发者 https://www.devze.com 2023-03-10 23:59 出处:网络
Suppose I have html like this: <div id=\"wrap\"> <div id=\"co开发者_如何转开发ntent\">

Suppose I have html like this:

<div id="wrap">
  <div id="co开发者_如何转开发ntent"> 
    <span>some content</span>
    <div id="s1">
     <p> some text </p>
    </div>
    <h2 id="sec1">
      <span> some text </span>
      <p> some text </p>
    </h2>
    <h2 id="sec1">
      <span> some text </span>
      <div> some more text </div> 
      <p> some text </p>
    </h2>
    <h2 id="sec2">
      <span> do not select me some text </span>
      <div> do not select me some more text </div> 
      <p> do not select me some text </p>
    </h2>
    <h2 id="sec3">
      <span> do not select me some text </span>
      <div> do not select me some more text </div> 
      <p> do not select me some text </p>
    </h2>
   </div>
 </div>

What is the XPath expression that selects all text node except those that are under h2 id=sec2 and h2 id=sec3 ?


Literally, "all text node except those that are under h2 id=sec2 and h2 id=sec3":

//text()[not(ancestor::h2[@id='sec2' or @id='sec3'])]

However I suspect that you don't really want that, because you would be losing the <span> and <p> structure. Would it be correct to infer that you want to select all the child elements of the content <div>, except for the <h2>s whose id's are sec2 and sec3? If so,

/div/div[@id = 'content']/*[not(self::h2 and (@id = 'sec2' or @id = 'sec3'))]

But you should also be aware that the text content of the <h2> element is merely the title of a section, not the whole text of the section. So it looks like by putting div's and p's inside an h2, you are not using it the way it is designed.


All elements under an <h2> (except …):

//h2[not(@id = 'sec2' or @id = 'sec3')]/*

All <span>, <div> or <p> elements anywhere (except …):

//*[self::span or self::div or self::p][not(parent::h2/@id = 'sec2' or parent::h2/@id = 'sec3')]

alternative notation (note the parens and the slightly changed predicate):

(//span|//div|//p)[not(parent::h2[@id = 'sec2' or @id = 'sec3'])]
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号