开发者

Google Search from a Python App

开发者 https://www.devze.com 2022-12-10 19:07 出处:网络
I\'m trying to run a google search query from a python app. Is there any python interface out there that would let me do this? If there isn\'t does anyone know which Goo开发者_运维知识库gle API will e

I'm trying to run a google search query from a python app. Is there any python interface out there that would let me do this? If there isn't does anyone know which Goo开发者_运维知识库gle API will enable me to do this. Thanks.


There's a simple example here (peculiarly missing some quotes;-). Most of what you'll see on the web is Python interfaces to the old, discontinued SOAP API -- the example I'm pointing to uses the newer and supported AJAX API, that's definitely the one you want!-)

Edit: here's a more complete Python 2.6 example with all the needed quotes &c;-)...:

#!/usr/bin/python
import json
import urllib

def showsome(searchfor):
  query = urllib.urlencode({'q': searchfor})
  url = 'http://ajax.googleapis.com/ajax/services/search/web?v=1.0&%s' % query
  search_response = urllib.urlopen(url)
  search_results = search_response.read()
  results = json.loads(search_results)
  data = results['responseData']
  print 'Total results: %s' % data['cursor']['estimatedResultCount']
  hits = data['results']
  print 'Top %d hits:' % len(hits)
  for h in hits: print ' ', h['url']
  print 'For more results, see %s' % data['cursor']['moreResultsUrl']

showsome('ermanno olmi')


Here is Alex's answer ported to Python3

#!/usr/bin/python3
import json
import urllib.request, urllib.parse

def showsome(searchfor):
  query = urllib.parse.urlencode({'q': searchfor})
  url = 'http://ajax.googleapis.com/ajax/services/search/web?v=1.0&%s' % query
  search_response = urllib.request.urlopen(url)
  search_results = search_response.read().decode("utf8")
  results = json.loads(search_results)
  data = results['responseData']
  print('Total results: %s' % data['cursor']['estimatedResultCount'])
  hits = data['results']
  print('Top %d hits:' % len(hits))
  for h in hits: print(' ', h['url'])
  print('For more results, see %s' % data['cursor']['moreResultsUrl'])

showsome('ermanno olmi')


Here's my approach to this: http://breakingcode.wordpress.com/2010/06/29/google-search-python/

A couple code examples:

    # Get the first 20 hits for: "Breaking Code" WordPress blog
    from google import search
    for url in search('"Breaking Code" WordPress blog', stop=20):
        print(url)

    # Get the first 20 hits for "Mariposa botnet" in Google Spain
    from google import search
    for url in search('Mariposa botnet', tld='es', lang='es', stop=20):
        print(url)

Note that this code does NOT use the Google API, and is still working to date (January 2012).


I am new in python and I was investigating how to do this. None of the provided examples are working properly for me. Some are blocked by google if you make many (few) requests, some are outdated. Parsing the google search html (adding the header in the request) will work until google changes the html structure again. You can use the same logic to search in any other search engine, looking into the html (view-source).

import urllib2

def getgoogleurl(search,siteurl=False):
    if siteurl==False:
        return 'http://www.google.com/search?q='+urllib2.quote(search)
    else:
        return 'http://www.google.com/search?q=site:'+urllib2.quote(siteurl)+'%20'+urllib2.quote(search)

def getgooglelinks(search,siteurl=False):
   #google returns 403 without user agent
   headers = {'User-agent':'Mozilla/11.0'}
   req = urllib2.Request(getgoogleurl(search,siteurl),None,headers)
   site = urllib2.urlopen(req)
   data = site.read()
   site.close()

   #no beatifulsoup because google html is generated with javascript
   start = data.find('<div id="res">')
   end = data.find('<div id="foot">')
   if data[start:end]=='':
      #error, no links to find
      return False
   else:
      links =[]
      data = data[start:end]
      start = 0
      end = 0        
      while start>-1 and end>-1:
          #get only results of the provided site
          if siteurl==False:
            start = data.find('<a href="/url?q=')
          else:
            start = data.find('<a href="/url?q='+str(siteurl))
          data = data[start+len('<a href="/url?q='):]
          end = data.find('&amp;sa=U&amp;ei=')
          if start>-1 and end>-1: 
              link =  urllib2.unquote(data[0:end])
              data = data[end:len(data)]
              if link.find('http')==0:
                  links.append(link)
      return links

Usage:

links = getgooglelinks('python','http://www.stackoverflow.com/')
for link in links:
       print link

(Edit 1: Adding a parameter to narrow the google search to a specific site)

(Edit 2: When I added this answer I was coding a Python script to search subtitles. I recently uploaded it to Github: Subseek)

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号