Extract content within a tag with BeautifulSoup_问答_开发者

Extract content within a tag with BeautifulSoup

开发者 https://www.devze.com 2023-03-06 07:37 出处：网络

I\'d like to extract the content Hello world. Please note that there are multiples <table> and similar <td colspan=\"2\"> on the page as well:

相关专题：python

I'd like to extract the content Hello world. Please note that there are multiples <table> and similar <td colspan="2"> on the page as well:

<table border="0" cellspacing="2" width="800">
  <tr>
    <td colspan="2"><b>Name: </b>Hello world</td>
  </tr>
  <tr>
...

I tried the following:

hello = soup.find(text='Name: ')
hello.findPreviousSiblings

But it returned nothing.

In addition, I'm also开发者_开发问答 having problem with the following extracting the My home address:

<td><b>Address:</b></td>

<td>My home address</td>

I'm also using the same method to search for the text="Address: " but how do I navigate down to the next line and extract the content of <td>?

The contents operator works well for extracting text from <tag>text</tag> .

<td>My home address</td> example:

s = '<td>My home address</td>'
soup =  BeautifulSoup(s)
td = soup.find('td') #<td>My home address</td>
td.contents #My home address

<td><b>Address:</b></td> example:

s = '<td><b>Address:</b></td>'
soup =  BeautifulSoup(s)
td = soup.find('td').find('b') #<b>Address:</b>
td.contents #Address:

use next instead

>>> s = '<table border="0" cellspacing="2" width="800"><tr><td colspan="2"><b>Name: </b>Hello world</td></tr><tr>'
>>> soup = BeautifulSoup(s)
>>> hello = soup.find(text='Name: ')
>>> hello.next
u'Hello world'

next and previous let you move through the document elements in the order they were processed by the parser while sibling methods work with the parse tree

Use the below code to get extract text and content from html tags with python beautifulSoup

s = '<td>Example information</td>' # your raw html
soup =  BeautifulSoup(s) #parse html with BeautifulSoup
td = soup.find('td') #tag of interest <td>Example information</td>
td.text #Example information # clean text from html

from bs4 import BeautifulSoup, Tag

def get_tag_html(tag: Tag):
    return ''.join([i.decode() if type(i) is Tag else i for i in tag.contents])

Extract content within a tag with BeautifulSoup

精彩评论

关注公众号

热门标签

图文推荐

Extract content within a tag with BeautifulSoup

更多 问答 相关资讯：

精彩评论

关注公众号

热门标签

图文推荐

更多问答相关资讯：