开发者

BeautifulSoup not working

开发者 https://www.devze.com 2023-01-23 10:17 出处:网络
i am trying to import the content of my blog using BeautifulSoup,using the the syntax as given below

i am trying to import the content of my blog using BeautifulSoup,using the the syntax as given below

 import urllib2
        from BeautifulSoup import BeautifulSoup
        response=urllib2.urlopen('http://www.bugsandbrains.blogspot.com')
        html=response.read()
        soup=BeautifulSoup(html)

Every thing worked fine two or three time after that it started throwing HtmlParseError i see it highly unlikely that the structure of the page might have changed within a few minutes what else can might be causing this problem ?

i am enclosing the trace as well.

 Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/usr/lib/pymodules/python2.6/BeautifulSoup.py", line 1499, in __init__
    BeautifulStoneSoup.__init__(self, *args, **kwargs)
  File "/usr/lib/pymodules/python2.6/BeautifulSoup.py", line 1230, in __init__
    self._feed(isHTML=isHTML)
  File "/usr/lib/pymodules/python2.6/BeautifulSou开发者_StackOverflowp.py", line 1263, in _feed
    self.builder.feed(markup)
  File "/usr/lib/python2.6/HTMLParser.py", line 108, in feed
    self.goahead(0)
  File "/usr/lib/python2.6/HTMLParser.py", line 150, in goahead
    k = self.parse_endtag(i)
  File "/usr/lib/python2.6/HTMLParser.py", line 317, in parse_endtag
    self.error("bad end tag: %r" % (rawdata[i:j],))
  File "/usr/lib/python2.6/HTMLParser.py", line 115, in error
    raise HTMLParseError(message, self.getpos())
HTMLParseError: bad end tag: u"</scr' + 'ipt>", at line 1152, column 16


I just tried your code on Windows with:

  • Python: 2.6 (same as yours)
  • BeautiSoup: 3.0.8.1 (latest)

I can't reproduce this. Are you using the latest code 3.0 series which is meant for Python 2.6, not 3.1 series which is for Python 3 [0]. Sorry, but can't think of any other clues right now.

[0] http://www.crummy.com/software/BeautifulSoup/#Download


I have tried your code, and it works. My env: ActivePython 2.6.6.15, BeautifulSoup 3.0.8.1. I printed out soup variable and it contains content of "Boredom Induced Post". When I tested http://www.bugsandbrains.blogspot.com with browsers they shows Wave Sandbox login page. No clue about what is wrong :(

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号