开发者

Python urllib2 parse html problem

开发者 https://www.devze.com 2023-03-24 02:24 出处:网络
I am using mechanize to parse html of website, but with this website i got strange result. from mechanize import Browser

I am using mechanize to parse html of website, but with this website i got strange result.

from mechanize import Browser
br = Browser()
r = br.open("http://www.heavenplaza.com")
result = r.read()

result is something which i can not understand. you can see here: http://paste2.org/p/1556077

Anyone can have some method to 开发者_运维知识库get that website HTML? with mechanize or urllib.

Thanks


import urllib2, StringIO, gzip
f = urllib2.urlopen("http://www.heavenplaza.com")
data = StringIO.StringIO(f.read())
gzipper = gzip.GzipFile(fileobj=data)
print gzipper.read()


I quickly checked the script in the console and the site was returning crap. You probably need to spoof your HTTP user agent to be something else that the site doesn't think you are using a robot.

http://www.google.com works

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号