开发者

Screen scaper that follows redirects and encodes to UTF-8

开发者 https://www.devze.com 2023-02-23 01:13 出处:网络
I\'m looking for a gem (or a combination of gems) that can, given an URL, return the page content as UTF-8. It should also follow redirects if the URL is changed.

I'm looking for a gem (or a combination of gems) that can, given an URL, return the page content as UTF-8. It should also follow redirects if the URL is changed.

D开发者_JS百科oes anyone know of such?

Thanks!


Have you looked at Nokogiri? It seems to do what you are looking for in terms of encoding:

ENCODING:

Strings are always stored as UTF-8 internally. Methods that return text values will always return UTF-8 encoded strings. Methods that return XML (like to_xml, to_html and inner_html) will return a string encoded like the source document.

You can also automate some of your screen scraping with Mechanize (click links, submit forms, etc). Mechanize builds on Nokogiri so it's a nice complement to it.

Some webcasts you may want to look at:

  • Nokogiri: http://railscasts.com/episodes/190-screen-scraping-with-nokogiri
  • Mechanize: http://railscasts.com/episodes/191-mechanize
0

精彩评论

暂无评论...
验证码 换一张
取 消