开发者

BeautifulSoup question

开发者 https://www.devze.com 2023-01-30 02:55 出处:网络
<parent1> <span>Text1</span> </parnet1> <parent2> <span>Text2</span>
<parent1>
    <span>Text1</span>
</parnet1>
<parent2>
    <span>Text2</span>
</parnet2>
<parent3>
    <span>Text3</span>
</parnet3>

I'm parsing this with Python & BeautifulSoup. I have a variable soupData which stores pointer for need object. How can I get pointer for the parent2, for example, if I have the text 开发者_如何学JAVAText2. So the problem is to filter span-tags by content. How can I do this?


After correcting the spelling on the end-tags:

[e for e in soup(recursive=False, text=False) if e.span.string == 'Text2']


I don't think there's a way to do it in a single step. So:

for parenttag in soupData:
    if parenttag.span.string == "Text2":
        do_stuff(parenttag)
        break

It's possible to use a generator expression, but not much shorter.


Using python 2.7.6 and BeautifulSoup 4.3.2 I found Marcelo's answer to give an empty list. This worked for me, however:

[x.parent for x in bSoup.findAll('span') if x.text == 'Text2'][0]

Alternatively, for a ridiculously overengineered solution (to this particular problem at least, but maybe it would be useful if you'll be doing filtering on criteria too long to put in a reasonably easily understandable list expression) you could do:

def hasText(text):
    def hasTextFunc(x):
        return x.text == text
    return hasTextFunc

to create a function factory, then

hasTextText2 = hasText('Text2')

filter(hasTextText2,bSoup.findAll('span'))[0].parent

to get the reference to the parent tag that you were looking for

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号