I tried this:
s = soup.findAll("table", {"class": "view"})
But it is giving me the table. But I need the table inside table.
<table class="view" >
<tr>
<t开发者_Python百科d width="46%" valign="top">
<table>
<tr>
<td>
<div style="adasdasd">
<div class="abc">dasdsadasdasdas</div>
</div>
<div>
<span><span class="aaaaaaa " title="aaaaaaaaaaa"><span>aaaaaaaaaaaaa</span></span> </span>
<b>My Face</b><br />
Hello This is me,
</div>
<div class="abc"">
Dec 6, 2010 by Alis
</div>
</td>
</tr>
</table>
</tr>
</table>
The things I want to scrap is:
Hello This is me,
My Face
Dec 6, 2010 by Alis
s = soup.findAll("table", {"class": "view"})[0].find("table")
If there's just the one table, you could use .find
for the first one too, and drop the [0]
.
Heres some better formatted html:
<table class="view" >
<tr>
<td width="46%" valign="top">
<table>
<tr>
<td>
<div style="adasdasd">
<div class="abc">dasdsadasdasdas</div>
</div>
<div>
<span>
<span class="aaaaaaa " title="aaaaaaaaaaa">
<span>aaaaaaaaaaaaa</span>
</span>
</span>
<b>My Face</b>
<br />
Hello This is me,
</div>
<div class="abc">
Dec 6, 2010 by Alis
</div>
</td>
</tr>
</table>
</td>
</tr>
</table>
Note: I actually added a tag because it was missing one.
innerTable = soup.find("table", {"class": "view"}).tr.td.table ##Gets the table in the first cell of the first row
innerDiv = innerTable.find("div", {"style": "adasdasd"}).nextSibling #this gets the div in which all of you content resides
So that will get you to that that holds all of your content. From there it's just a little bit of parsing to get the content you actually need.
精彩评论