开发者

python beautifulsoup adding extra end tags

开发者 https://www.devze.com 2023-01-11 19:01 出处:网络
I\'m using Beautifulsoup to parse a website 开发者_开发知识库 request = urllib2.Request(url) response = urllib2.urlopen(request)

I'm using Beautifulsoup to parse a website

开发者_开发知识库
  request = urllib2.Request(url)
  response = urllib2.urlopen(request)
  soup = BeautifulSoup.BeautifulSoup(response)

I am using it to traverse a table. The problem I am running into is that BS is adding an extra end tag for the table into the html which doesn't exist, which I verified with: print soup.prettify(). So, one of the td tags is getting left out of the table and I can't select it.


How about searching directly for each tag instead of trying to traverse into the table?

   for td in soup.find("td"):
        ...

its not unusual to find the tbody tag nested within a table automatically when its not in the code. Either you can code for it or just jump straight to the tr or td tag.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号