开发者

How can I get all html following a searched item using BeautifulSoup in Python? [closed]

开发者 https://www.devze.com 2023-02-28 07:45 出处:网络
It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical andcannot be reasonably answered in its current form. For help clari
It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, 开发者_开发技巧visit the help center. Closed 11 years ago.

I am trying to return all of the html after a search text string using BeautifulSoup in Python. Here is my code:

html = '<html>table1<table><tr>text1<td>text2</td></tr></table>table2<table><tr>text3<td>text4</td></tr></table></html>'
soup = BeautifulSoup(''.join(html))
foundtext = soup.find(text='text1')
soup2 = foundtext.findAll()

This code is giving me error. In soup2, I would like to have:

<td>text2</td></tr></table>table2<table><tr>text3<td>text4</td></tr></table></html>

which is all html code following the phrase 'text1'.


The following code will print out the nodes after the first occurence of text1

from BeautifulSoup import BeautifulSoup, NavigableString

html = '<html>table1<table><tr>text1<td>text2</td></tr></table>table2<table><tr>text3<td>text4</td></tr></table></html>'
soup = BeautifulSoup(html)

found = False
for node in soup.recursiveChildGenerator():
    if found:
        print node
    if isinstance(node, NavigableString) and node == 'text1':
        found = True


> suxmac2:tmp ajung$ bin/python out 
> <td>text2</td> text2 table2
> <table><tr>text3<td>text4</td></tr></table>
> <tr>text3<td>text4</td></tr> text3
> <td>text4</td> text4

Adjusting the code to your further needs is up to you...we helped you already several times. Once again: read the BeautifulSoup documentation - you got the link meanwhile numerous times.


I believe that is not possible, as BeautifulSoup keeps the parsed HTML as a tree structure. What you could do is to extract all unwanted elements using http://www.crummy.com/software/BeautifulSoup/documentation.html#Removing%20elements , which would return the HTML in front of your search string as well.

Apart from that, you could also use the HTML snippet from the element that you searched for. You can see in BeautifulSoup Documentation that find returns a HTML string. Use that and simple python string-searching methods to cut away everything until the end of the found string. That will probably require more handwork and basically is like combining the answer How can I get all html following a searched item using BeautifulSoup in Python? with BeautifulSoup's search method.

0

精彩评论

暂无评论...
验证码 换一张
取 消