i am trying to import the content of my blog using BeautifulSoup,using the the syntax as given below
import urllib2
from BeautifulSoup import BeautifulSoup
response=urllib2.urlopen('http://www.bugsandbrains.blogspot.com')
html=response.read()
soup=BeautifulSoup(html)
Every thing worked fine two or three time after that it started throwing HtmlParseError
i see it highly unlikely that the structure of the page might have changed within a few minutes what else can might be causing this problem ?
i am enclosing the trace as well.
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "/usr/lib/pymodules/python2.6/BeautifulSoup.py", line 1499, in __init__
BeautifulStoneSoup.__init__(self, *args, **kwargs)
File "/usr/lib/pymodules/python2.6/BeautifulSoup.py", line 1230, in __init__
self._feed(isHTML=isHTML)
File "/usr/lib/pymodules/python2.6/BeautifulSou开发者_StackOverflowp.py", line 1263, in _feed
self.builder.feed(markup)
File "/usr/lib/python2.6/HTMLParser.py", line 108, in feed
self.goahead(0)
File "/usr/lib/python2.6/HTMLParser.py", line 150, in goahead
k = self.parse_endtag(i)
File "/usr/lib/python2.6/HTMLParser.py", line 317, in parse_endtag
self.error("bad end tag: %r" % (rawdata[i:j],))
File "/usr/lib/python2.6/HTMLParser.py", line 115, in error
raise HTMLParseError(message, self.getpos())
HTMLParseError: bad end tag: u"</scr' + 'ipt>", at line 1152, column 16
I just tried your code on Windows with:
Python: 2.6
(same as yours)BeautiSoup: 3.0.8.1
(latest)
I can't reproduce this. Are you using the latest code 3.0 series
which is meant for Python 2.6
, not 3.1 series
which is for Python 3
[0]. Sorry, but can't think of any other clues right now.
[0] http://www.crummy.com/software/BeautifulSoup/#Download
I have tried your code, and it works. My env: ActivePython 2.6.6.15, BeautifulSoup 3.0.8.1. I printed out soup
variable and it contains content of "Boredom Induced Post". When I tested http://www.bugsandbrains.blogspot.com with browsers they shows Wave Sandbox login page. No clue about what is wrong :(
精彩评论