soup = BeautifulSoup(html).findAll('div', 'thread')
for i in soup:
print i
I'll take only this part of开发者_如何学Python the code because that's where I'm getting stuck in.
Soup returns a list, I tried to use ' '.join() to have a literal string and it didn't work, because it's expected a string, not a tag. I guess it's sort of bug.
Iterating, it prints on screen all the list without comma.
But what I want is to get a href content inside div cass="thread"
I tried many things like
soup = BeautifulSoup(html).findAll('div', 'thread')
for i in soup:
print BeautifulSoup(i)('a')['href']
The last code gives me 'NoneType' object is not callabe.
I'm trying a lot of combinations but I am indeed stuck in, I can't have it working at all. I don't know what to do after many failed try-outs. It's frustrating.
It should be something like
divs = BeautifulSoup(html).findAll('div','thread')
for div in divs:
print div.find('a').attr['href'] # may it be map(a.attrs)['href'], I don't remember now
taking a look at the documentation for this module/class (http://www.crummy.com/software/BeautifulSoup/documentation.html) - the second argument for findAll
is a json object, not a string. have you tried this instead:
BeautifulSoup(html).findAll('div', { 'class': 'thread' })
精彩评论