开发者

XPath: Select all following nodes until some node

开发者 https://www.devze.com 2023-01-28 16:30 出处:网络
I\'m try to parse the structure below into a sets of journey options so that I can find out all the possible ways to get from Po开发者_开发百科ntypridd to Llangollen and back.

I'm try to parse the structure below into a sets of journey options so that I can find out all the possible ways to get from Po开发者_开发百科ntypridd to Llangollen and back.

Using XPath, I can do //div[@class='JourneyOptions'] to select all the rows that actually contain journey info. Outside of XPath, I could iterate over each row to decide whether it should be added to a set of journeys, or whether it is the first in a new set of journeys.

In the below example, all the journey sets will contain two journeys, but a set may contain just one journey (a 'direct' journey), or more than two (more than one 'connection').

Is there an XPath expression to select all the journeys of the first outbound set, all the journeys of the second outbound set and so on?

The first journey in every set has a radio input with an integer value. I could dynamically generate these to seelct each set, but would need to know when to stop generating (or just wait for the XPath to fail).

<div class='TableHolder'>

  <p>...</p>
  <h2 id='DirectionHeader'>Outbound Options</h2>
  <p>Pontypridd to Llangollen, 30/11/1910</p>

  <!-- first part of the first journey from Pontypridd to Llangollen -->
  <div class='JourneyOptions'>
    <div class='Journey'>
      <div class='ColumnOne'>
          <input type='radio' checked='checked' name='out' value='1'>
      </div>
      ... some more divs of parseable journey info ...
    </div>
  <div>

  <!-- second part of the first journey from Pontypridd to Llangollen -->
  <div class='JourneyOptions'>
    <div class='Journey'>
      <div class='ConnectingJournies'>
          <p>...</p>
      </div>
      <div class='ColumnOne'>
          ... doesn't contain a radio input ...
      </div>
      ... some more divs of parseable journey info ...
    </div>
  </div>

  <!-- first part of the second journey from Pontypridd to Llangollen -->
  <div class='JourneyOptions'>
    <div class='Journey'>
      <div class='ColumnOne'>
          <input type='radio' name='out' value='2'>
      </div>
      ... some more divs of parseable journey info ...
    </div>
  <div>

  <!-- second part of the second journey from Pontypridd to Llangollen -->
  <div class='JourneyOptions'>
    <div class='Journey'>
      <div class='ConnectingJournies'>
          <p>...</p>
      </div>
      <div class='ColumnOne'>
          ... doesn't contain a radio input ...
      </div>
      ... some more divs of parseable journey info ...
    </div>
  </div>

  ... some more outbound journey options ...

  <p>...</p>
  <h2 id='DirectionHeader'>Inbound Options</h2>
  <p>Llangollen to Pontypridd, 07/11/1910</p>

  <!-- first part of the first journey from Llangollen to Pontypridd -->
  <div class='JourneyOptions'>
    <div class='Journey'>
      <div class='ColumnOne'>
          <input type='radio' checked='checked' name='in' value='1'>
      </div>
      ... some more divs of parseable journey info ...
    </div>
  <div>

  <!-- second part of the first journey from Llangollen to Pontypridd -->
  <div class='JourneyOptions'>
    <div class='Journey'>
      <div class='ConnectingJournies'>
          <p>...</p>
      </div>
      <div class='ColumnOne'>
          ... doesn't contain a radio input ...
      </div>
      ... some more divs of parseable journey info ...
    </div>
  </div>

  <!-- first part of the second journey from Llangollen to Pontypridd -->
  <div class='JourneyOptions'>
    <div class='Journey'>
      <div class='ColumnOne'>
          <input type='radio' name='in' value='2'>
      </div>
      ... some more divs of parseable journey info ...
    </div>
  <div>

  <!-- second part of the second journey from Llangollen to Pontypridd -->
  <div class='JourneyOptions'>
    <div class='Journey'>
      <div class='ConnectingJournies'>
          <p>...</p>
      </div>
      <div class='ColumnOne'>
          ... doesn't contain a radio input ...
      </div>
      ... some more divs of parseable journey info ...
    </div>
  </div>

  ... some more inbound journey options ...
</div>

Sorry for the large example, but I think this is as small as I can make it while still being representative of my problem.


Node sets are just... well, sets: host language dependency ordered (document order for the most) unique nodes. If you want the result expressing some kind of hierarchy or grouping, the answer is that you can't.

So, you could select the start of each group with:

/div[@class='TableHolder']  
    /div[@class='JourneyOptions']
        [div[@class='Journey'] 
          /div[@class='ColumnOne'] 
              /input[@type='radio']
        ]

One group at the time (there are many options):

/div[@class='TableHolder']
    /div[@class='JourneyOptions']
        [count(
            (self::div|preceding-sibling::div)
                [div[@class='Journey']
                    /div[@class='ColumnOne']
                       /input[@type='radio']
                ]
              ) = 1
        ]
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号