开发者

How to manipulate DOM with Ruby on Rails

开发者 https://www.devze.com 2022-12-09 19:18 出处:网络
As the title said, I have some DOM manipulation tasks. For example, I want to: - find all H1 element which have blue color.

As the title said, I have some DOM manipulation tasks. For example, I want to: - find all H1 element which have blue color. - find all text which have size 12px. - etc..

How can I do it with Rails?

Thank you.. :)

Update

I have been doing some research about extracting web page content based on this paper-> http://www.springerlink.com/index/A65708XMUR9KN9EA.pdf

The summary of the开发者_如何学JAVA step is:

  1. get the web url which I want to be extracted (single web page)
  2. grab some elements from the web page based on some visual rules (Ex: grab all H1 which have blue color)
  3. process the elements with my algorithm
  4. save the result into my database.

-sorry for my bad english-


If what you're trying to do is manipulate HTML documents inside a rails application, you should take a look at Nokogiri.

It uses XPath to search through the document. With the following, you would find any h1 with the "blue" css class inside a document.

require 'nokogiri'
require 'open-uri'

doc = Nokogiri::HTML(open('http://www.stackoverflow.com'))
doc.xpath('//h1/a[@class="blue"]').each do |link|
    puts link.content
end

After, if what you were trying to do was indeed parse the current page dom, you should take a look at JavaScript and JQuery. Rails can't do that.


http://railscasts.com/episodes/190-screen-scraping-with-nokogiri


To reliably sort out what color an arbitrary element on a webpage is, you would need to reverse engineer a browser (to accurately take into account stylesheets, markup hacks, broken tags, images, etc).

A far easier approach would be to embed an existing browser such as gecko into a custom application of your making.

As your spider would browse pages, it would pass them to your embedded instance of gecko where you could use getComputedStyle to pull what color an individual element happens to be.

You originally mentioned wanting to use Ruby on Rails for this project, Rails is a framework for writing presentational applications and really a bad fit for a project like this.

As a starting point, I'd recommend you check out RubyGnome, and in particular RubyGnome's Gtk::MozEmbed functionality.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号