for link in br.links(url_regex="inquiry-results.jsp"):
cb[link.url] = link
for page_link in cb.values():
for link in br.links(url_regex="inquiryDetail.jis"):
....................
url = link.absolute_url
br.follow_link(link)
.....开发者_StackOverflow中文版.................
br.follow_link(page_link)
This is my code. Basically, it extracts page links [Link of page 1,2,3,4,5...] and data links from particular page. Then it goes in each data link and extracts some data and when done it moves to the next page. But I always get this error:
Traceback (most recent call last):
File "C:\python27\test.py", line 95, in <module>
for link in br.links(url_regex="inquiryDetail.jis"):
File "build\bdist.win32\egg\mechanize\_mechanize.py", line 405, in links
mechanize._mechanize.BrowserStateError: not viewing HTML
Can anyone help?
Thanks to the link posted by loevborg, I've been using this:
br.open('http://example.com')
br._factory.is_html = True
Now br.viewing_html()
will evaluate to True
This seems to be related to a check to see if the response is valid HTML:
http://github.com/jjlee/mechanize/blob/master/mechanize/_mechanize.py#L440
Perhaps the response you get it XHTML, or has invalid headers? There may be some way to override the is_html
attribute (like here).
Introduce your app as a browser before br.open might help this:
br.addheaders = [('User-agent','Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/45.0.2454101')]
精彩评论