开发者

Convert google search results into json in python 3.1

开发者 https://www.devze.com 2022-12-18 00:12 出处:网络
I am writing a Python program that feeds a search term to google using the google search API and downloads the first 10 results. I was able to do this in Python 2.6 as follows:

I am writing a Python program that feeds a search term to google using the google search API and downloads the first 10 results. I was able to do this in Python 2.6 as follows:

query = urllib.parse.urlencode({'q' : 'searchterm','start' : k},doseq=false)
url = 'http://ajax.googleapis.com/ajax/services/search/web?v=1.0&%s' \
             % (query)
results = urllib.urlopen(url)
resultsjson = json.loads(results.read())
betterResults += resultsjson["responseData"]["results"]

Google's search API returns the results as a json, so I used the above code to download the results into a json of my and parse them into a list (betterResults).

When I switched over to Python 3, my program began throwing exceptions. Apparently, in Python 2.6 the object returned by urlopen() is a file-like object that can be loaded into a json. In Python 3.1, the object returned is an HTTPResponse object, which does开发者_开发问答 contain a read() method, as required by the json specifications, but is a byte object. I was therefore unable to access the information as I had in 2.6.

Is there any way to access the json returned by google? How can I get the results in Python 3 and be able to select which fields I want, as I was able to do with the json?

Thank you very much, bsg


You'll need to decode the byte object if you want to use it with json.loads

resultjson =  json.loads(results.read().decode())

docs also suggest to pass encoding parameter to the loads function:

json.loads(results.read(), encoding=<encoding-type>)

I think Lennart has an explanation how to get the encoding-type.


The object returned by urlopen is file like, you are wrong there. But you use json.loads(), which expects a string. json.load() expects a file like object.

However, json.load() expects the result of the read() method to be a string, while of course the read you get will be bytes, so you need to decode it from bytes to a string first.

So, something like this:

query = urllib.parse.urlencode({'q' : 'searchterm','start' : k},doseq=false)
url = 'http://ajax.googleapis.com/ajax/services/search/web?v=1.0&%s' \
             % (query)
results = urllib.urlopen(url)
encoding = input.getheader('content-type').split('=')[-1]
resultsjson = json.loads(results.read().decode(encoding))
betterResults += resultsjson["responseData"]["results"]

Might work. (I didn't test it).

0

精彩评论

暂无评论...
验证码 换一张
取 消