I would like to be able to select the table containing the "Accounts Payable" text but I'm not getting anywhere with what I'm trying and I'm pretty much guessing using findall. Can someone show me how I would do this?
For example th开发者_StackOverflowis is what I start with:
<div>
<tr>
<td class="lft lm">Accounts Payable
</td>
<td class="r">222.82</td>
<td class="r">92.54</td>
<td class="r">100.34</td>
<td class="r rm">99.95</td>
</tr>
<tr>
<td class="lft lm">Accrued Expenses
</td>
<td class="r">36.49</td>
<td class="r">33.39</td>
<td class="r">31.39</td>
<td class="r rm">36.47</td>
</tr>
</div>
And this is what I would like to get as a result:
<tr>
<td class="lft lm">Accounts Payable
</td>
<td class="r">222.82</td>
<td class="r">92.54</td>
<td class="r">100.34</td>
<td class="r rm">99.95</td>
</tr>
You can select the td elements with class lft lm and then examine the element.string to determine if you have the "Accounts Payable" td:
import sys
from BeautifulSoup import BeautifulSoup
# where so_soup.txt is your html
f = open ("so_soup.txt", "r")
data = f.readlines ()
f.close ()
soup = BeautifulSoup ("".join (data))
cells = soup.findAll('td', {"class" : "lft lm"})
for cell in cells:
# You can compare cell.string against "Accounts Payable"
print (cell.string)
If you would like to examine the following siblings for Accounts Payable for instance, you could use the following:
if (cell.string.strip () == "Accounts Payable"):
sibling = cell.findNextSibling ()
while (sibling):
print ("\t" + sibling.string)
sibling = sibling.findNextSibling ()
Update for Edit
If you would like to print out the original HTML, just for the siblings that follow the Accounts Payable element, this is the code for that:
lines = ["<tr>"]
for cell in cells:
lines.append (cell.prettify().decode('ascii'))
if (cell.string.strip () == "Accounts Payable"):
sibling = cell.findNextSibling ()
while (sibling):
lines.append (sibling.prettify().decode('ascii'))
sibling = sibling.findNextSibling ()
lines.append ("</tr>")
f = open ("so_soup_out.txt", "wt")
f.writelines (lines)
f.close ()
精彩评论