开发者

Nokogiri - Works with XML, not so much with HTML

开发者 https://www.devze.com 2023-02-24 04:18 出处:网络
I\'m having an issue getting Nokogiri开发者_运维技巧 to work properly. I\'m using version 1.4.4 with Ruby 1.9.2.

I'm having an issue getting Nokogiri开发者_运维技巧 to work properly. I'm using version 1.4.4 with Ruby 1.9.2.

I have both libxml2 libxslt installed and up to date. When I run a Ruby script with XML, it works great.

require 'nokogiri'

doc = Nokogiri::XML(File.open("test.xml"))
doc = doc.css("name").each do |node|
    puts node.text
end

Enter into the CL, run ruby test.rb, returns

Name 1
Name 2
Name 3

And the crowd goes wild. I tweak a few things, make a few adjustments to the code...

require 'nokogiri'
require 'open-uri'

doc = Nokogiri::HTML(open("http://domain.tld"))
doc = doc.css("p").each do |node|
    puts node.text
end

Back to CL, ruby test.rb, returns... nothing! Just a new, empty line.

Is there any reason that it will work with an XML file, but not HTML?


To debug this sort of problem we need more information from you. Since you're not giving a working URL, and because we know that Nokogiri works fine for this sort of problem, the debugging falls on you.

Here's what I would do to test:

In IRB:

  1. Do you get output when you do: open('http://whateverURLyouarehiding.com').read
  2. If that returns a valid document, what do you get when you wrap the previous open statement in Nokogiri::HTML(...). That needs to preserve the .read in the previous line too, so Nokogiri is receiving the body of the page, NOT an IO stream.
  3. Try #2 above, but remove the .read. That will tell if there's a problem with Nokogiri reading an IO stream, though I seriously doubt it has a problem since I use it all the time. At that point I'd suspect a problem on your system.
  4. If you're getting a document in #2 and #3, then the problem could be in your accessor; I suspect what you're looking for doesn't exist.
  5. If it does exist, then check the value of doc.errors after Nokogiri parses the document. It could be finding errors in the document, and, if so, they'll be captured there.
0

精彩评论

暂无评论...
验证码 换一张
取 消