开发者

how to get google search results links and store them in array using mechanize

开发者 https://www.devze.com 2023-03-07 23:05 出处:网络
iwant to get the 10 google search results links (href) using mechanize so i wrote this code, but the code does not return the right google search results, what should i write?

i want to get the 10 google search results links (href) using mechanize so i wrote this code, but the code does not return the right google search results, what should i write?

    @searchword = params[:q]
    @sitesurl = Array.new
    agent = Mechanize.new
    page = agent.get("http://www.google.com")
    search_form = page.form_with(:name => "f")
    search_form.field_with(:name => "q").value = @searchword.to_s 
    search_results = agent.submit(search_form)
    count = 0
    c = 0
    while  c < 10
    if (search_results开发者_开发技巧/"li")[count].attributes['class'].to_s == "g knavi"
      site = (search_results/"li")[count]
      code = (site/"a")[0].attributes['href']
      @sitesurl << code
      c += 1
    end
    count += 1
end


Something like this should work:

@searchword = params[:q]
@sitesurl = Array.new
agent = Mechanize.new
page = agent.get("http://www.google.com")
search_form = page.form_with(:name => "f")
search_form.field_with(:name => "q").value = @searchword.to_s     
search_results = agent.submit(search_form)

(search_results/"li.g").each do |result|
  @sitesurl << (result/"a").first.attribute('href') if result.attribute('class').to_s == 'g knavi'
end


This is the updated one for now. Tested and work's fine

require 'rubygems'
require 'mechanize' 
require 'hpricot'

agent = Mechanize.new 
agent.user_agent_alias = 'Linux Firefox' 
page = agent.get('http://google.com/') 
google_form = page.form('f') google_form.q = 'your search'

page = agent.submit(google_form)

 page.links.each do |link|
if link.href.to_s =~/url.q/
        str=link.href.to_s
        strList=str.split(%r{=|&}) 
        url=strList[1]
        # if You need cached url's then just remove this condition and simply use URL's 
        if ! url.include? "webcache"
            puts url
        end
     end
  end 

Just create a new array and push the url's to array.


It's not working now, I suppose that could be because Google recently changes their HTML in the search results and URLs.


Nowadays, the answers above don't work anymore. We have released our own gem that is easy to use and allow custom locations:

query = GoogleSearchResults.new q: "coffee", location: "Portand"
hash_results = query.get_hash

Repository: https://github.com/serpapi/google-search-results-ruby

0

精彩评论

暂无评论...
验证码 换一张
取 消