开发者

Iterating in Python and BeautifulSoup

开发者 https://www.devze.com 2023-02-15 04:49 出处:网络
soup = BeautifulSoup(html).findAll(\'div\', \'thread\') for i in soup: print i I\'ll take only this part of开发者_如何学Python the code because that\'s where I\'m getting stuck in.
soup = BeautifulSoup(html).findAll('div', 'thread')
  for i in soup:
    print i

I'll take only this part of开发者_如何学Python the code because that's where I'm getting stuck in.

Soup returns a list, I tried to use ' '.join() to have a literal string and it didn't work, because it's expected a string, not a tag. I guess it's sort of bug.

Iterating, it prints on screen all the list without comma.

But what I want is to get a href content inside div cass="thread"

I tried many things like

soup = BeautifulSoup(html).findAll('div', 'thread')
  for i in soup:
    print BeautifulSoup(i)('a')['href']

The last code gives me 'NoneType' object is not callabe.

I'm trying a lot of combinations but I am indeed stuck in, I can't have it working at all. I don't know what to do after many failed try-outs. It's frustrating.


It should be something like

divs = BeautifulSoup(html).findAll('div','thread')  
for div in divs:  
    print div.find('a').attr['href'] # may it be map(a.attrs)['href'], I don't remember now


taking a look at the documentation for this module/class (http://www.crummy.com/software/BeautifulSoup/documentation.html) - the second argument for findAll is a json object, not a string. have you tried this instead:

BeautifulSoup(html).findAll('div', { 'class': 'thread' })
0

精彩评论

暂无评论...
验证码 换一张
取 消