开发者

Ruby - Mechanize: Select link by classname and other questions

开发者 https://www.devze.com 2022-12-18 15:05 出处:网络
At the moment I\'m having a look on Mechanize. I am pretty new to Ruby, so please be patient. I wrote a little test script:

At the moment I'm having a look on Mechanize. I am pretty new to Ruby, so please be patient.

I wrote a little test script:

require 'rubygems'
require 'mechanize'

agent = WWW::Mechanize.new

page = agent.get('http://www.google.de')
pp page.title
google_form = page.form_with(:name => 'f')
google_form.q = 'test'
page = agent.submit(google_form)
pp page.title

page_links = Array.new
page.links.each do |ll|
  page_links << ll
end
puts page_links.size

This works. But page_links includes not only the search results. It also includes the google links like Login, Pictures, ... The result links own a styleclass "1". Is it possible to select only the links with class 开发者_高级运维== 1? How do I achieve this?

Is it possible to modify the "agentalias"? If I own a website, including google analytics or something, what browserclient will I see in ga going with mechanize on my site?

Can I select elements by their ID instead of their name? I tried to use

my_form = page.form_with(:id => 'myformid')

But this does not work.


in such cases like your I am using Nokogiri DOM search. Here is your code a little bit rewritten:

require 'rubygems'
require 'mechanize'

agent = Mechanize.new

page = agent.get('http://www.google.de')
pp page.title
google_form = page.form_with(:name => 'f')
google_form.q = 'test'
page = agent.submit(google_form)
pp page.title

page_links = Array.new
#maybe you better use 'h3.r > a.l' here
page.parser.css("a.l").each do |ll|
#page.parser here is Nokogiri::HTML::Document
  page_links << ll
  puts ll.text + "=>" + ll["href"]
end
puts page_links.size

Probably this article is a good place to start: getting-started-with-nokogiri By the way samples in the article also deal with Google search ;)


You can build a list of just the search result links by changing your code as follows:

page.links.each do |ll|
  cls = ll.attributes.attributes['class']
  page_links << ll if cls && cls.value == 'l'
end

For each element ll in page.links, ll.attributes is a Nokogiri::XML::Element and ll.attributes.attributes is a Hash containing the attributes on the link, hence the need for ll.attributes.attributes to get at the actual class and the need for the nil check before comparing the value to 'l'

The problem with using :id in the criteria to find a form is that it clashes with Ruby's Object#id method for returning a Ruby object's internal id. I'm not sure what the work around for this is. You would have no problem selecting the form by some other attribute (e.g. its action.)


I believe the selector you are looking for is:
:dom_id
e.g. in your case:
my_form = page.form_with(:dom_id => 'myformid')

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号