开发者

Use BeautifulSoup to extract sibling nodes between two nodes

开发者 https://www.devze.com 2022-12-24 04:07 出处:网络
I\'ve got a document like this: <p class=\"top\">I don\'t want this</p> <p>I want this</p>

I've got a document like this:

<p class="top">I don't want this</p>

<p>I want this</p>
<table>
   <!-- ... -->开发者_如何学编程;
</table>

<img ... />

<p> and all that stuff too</p>

<p class="end>But not this and nothing after it</p>

I want to extract everything between the p[class=top] and p[class=end] paragraphs.

Is there a nice way I can do this with BeautifulSoup?


node.nextSibling attribute is your solution:

from BeautifulSoup import BeautifulSoup

soup = BeautifulSoup(html)

nextNode = soup.find('p', {'class': 'top'})
while True:
    # process
    nextNode = nextNode.nextSibling
    if getattr(nextNode, 'name', None)  == 'p' and nextNode.get('class', None) == 'end':
        break

This complicated condition is to be sure that you're accessing attributes of HTML tag and not string nodes.

0

精彩评论

暂无评论...
验证码 换一张
取 消