Pulling a URL from a feed using Nokogiri_问答_开发者

Pulling a URL from a feed using Nokogiri

开发者 https://www.devze.com 2023-03-27 23:49 出处：网络

Let\'s say I have this in a document: <entry> <link rel=\"replies\" type=\"application/atom+xml\" href=\"http://www.url.com/feeds/1/comments/default\" title=\"Comments\"/>

Let's say I have this in a document:

<entry>
  <link rel="replies" type="application/atom+xml" href="http://www.url.com/feeds/1/comments/default" title="Comments"/>
  <link rel="alternate" type="text/html" href="http://www.url.com/a_blog_post.html" title="A Blog Post"/>
</entry>

<entry>
  <link rel="replies" type="application/atom+xml" href="http://www.url.com/feeds/2/comments/default" title="Comments"/>
  <link rel="alternate" type="text/html" href="http://www.url.com/another_blog_post.html" title="Another Blog Post"/>
</entry>

I am trying to use Nokogiri to pull the urls for each of the blog posts, but I am apparently going about it all wrong (I'开发者_StackOverflowm new to programming and having trouble understanding nokogiri)

Here's what I have:

require 'nokogiri'
require 'open-uri'

def get_posts(url)
  posts = []
  doc = Nokogiri::HTML(open(url))
  doc.css('entry.alternate').each do |e|
    puts e['href']
    posts << e['href']
  end
  return posts
end 

puts "Enter feed url:"
url = gets.chomp
posts = get_posts(url)
puts posts.to_s

Any help would be great! I started this little thing to better learn to program, but I'm stuck. My output currently is []

Your CSS selector is wrong, entry.alternate would select all entry elements with alternate class (that is something like <entry class="alternate" />).

I suppose you want to select all link elements that have rel attribute with value of alternate. CSS selector for this is link[rel=alternate]. So change your code like this:

doc.css('link[rel=alternate]').each do |e|
  puts e['href']
  posts << e['href']
end

You can read more about CSS selectors here: http://www.w3.org/TR/CSS2/selector.html.

Try with doc.xpath "//entry/link[@rel='alternate']" instead of doc.css('entry.alternate'). It works for me.

If you only want the href attribute of the links, note that you can more simply do:

def get_posts(url)
  Nokogiri::XML(open(url))
    .xpath('//link[@rel="alternate"]/@href')
    .map(&:value)
end

The XPath above selects not the link elements, but the href attributes on those elements; the map then turns this array of Nokogiri::XML::Attr objects into an array of just their values (as strings). Since this is the last expression in the method, the array is the return value.