finding a tag based on what it surrounds (using beautifulsoup)_问答_开发者

finding a tag based on what it surrounds (using beautifulsoup)

开发者 https://www.devze.com 2023-03-11 08:45 出处：网络

I\'m using BeautifulSoup to parse some HTML.Let\'s say I have the following HTML in a BeautifulSoup called soup:

相关专题：python

I'm using BeautifulSoup to parse some HTML. Let's say I have the following HTML in a BeautifulSoup called soup:

<td class="1">test1</td>
<td>test2</td>
<td class="3"><a href="/">test3</a></td>
<td><div class="test4"><a class="test4" href="/">test4</a></div></td>
<td><div class="test4"><a class="test4" href="/">test4</a></div></td>

I can get all 'td' tagged items with:

soup.findAll("td")

But how can I find only the 'td' tags that surround divs that have class of test4? Or that surround 'a'开发者_JAVA技巧 tags with test4?

I know I can locate tags with attributes, such as:

soup.findAll("a", {"class":"test4"})

But I need to combine this with the initial 'td' search so that I throw out all 'td' tags that don't surround the 'a' or 'div' tags.

Ideas? Thanks!

This only works if the immediate parent of the test4 element is a td, but it should give you an idea of how to make a more complex query:

>>> from BeautifulSoup import BeautifulSoup
>>> soup = BeautifulSoup('''<td class="1">test1</td>
... <td>test2</td>
... <td class="3"><a href="/">test3</a></td>
... <td><div class="test4"><a class="test4" href="/">test4</a></div></td>
... <td><div class="test4"><a class="test4" href="/">test4</a></div></td>
... ''')
>>> [tag.parent for tag in soup.findAll(attrs = {"class": "test4"})
...  if tag.name in ['a', 'div'] and tag.parent.name == 'td']
[<td><div class="test4"><a class="test4" href="/">test4</a></div></td>, <td><div class="test4"><a class="test4" href="/">test4</a></div></td>]

This is how I would do it:

>>> tdList = []
>>> for td in soup.findAll('td'):
...     for div in td.findAll('div',{'class':'test4'}):
...         tdList.append(div.parent)
... 
>>> tdList
[<td><div class="test4"><a class="test4" href="/">test4</a></div></td>, <td><div class="test4"><a class="test4" href="/">test4</a></div></td>]

Of course you could increase the granularity as much as needed, but for the provided html this gets the job done.