The following code:
# fetch the top 300 podcasts from itunes
itunes_top_300 = Nokogiri.HTML(open("http://itunes.apple.com/us/rss/toppo开发者_JS百科dcasts/limit=25/xml"))
# parse the returned xml with nokogiri
itunes_top_300.xpath('//feed/entry').each do |entry|
name = entry.xpath("//name").text
url = entry.xpath("//link/@href").text
category = entry.xpath("//category/@term").text
hosts = entry.xpath("//artist").text
summary = entry.xpath("//summary").text
artwork = entry.xpath("//image[@height='170']").text
return name + url
end
Is outputting in the view:
iTunes StoreThis American LifeNPR: Wait Wait... Don't Tell Me! PodcastStuff You Should KnowFreakonomics RadioNPR: Fresh Air PodcastNPR: Car Talk PodcastWNYC's RadiolabDespicable MePearls Before Swine Animated CartoonsThe Moth PodcastAPM: A Prairie Home Companion's News from Lake WobegonHarry Potter Years 1-5 PodcastAce On The HouseTakers - Takers Featurette: Executing the Heist - The Making of TakersNPR: Planet Money PodcastStuff You Missed in History ClassThe Dave Ramsey ShowBook ReviewGlobal NewsVampires Suck ClipsNPR: Science Friday PodcastOther Guys Crash and BurnBack to WorkNPR: All Songs Considered PodcastNPR: Tiny Desk Concerts Podcasthttp://itunes.apple.com/WebObjects/MZStore.woa/wa/viewTop?id=38&popId=3http://ax.itunes.apple.com/WebObjects/MZStoreServices.woa/ws/RSS/toppodcasts/limit=25/xml?cc=ushttp://itunes.apple.com/us/podcast/this-american-life/id201671138?uo=2&uo=2http://itunes.apple.com/us/podcast/npr-wait-wait-dont-tell-me/id121493804?uo=2&uo=2http://itunes.apple.com/us/podcast/stuff-you-should-know/id278981407?uo=2&uo=2http://itunes.apple.com/us/podcast/freakonomics-radio/id354668519?uo=2&uo=2http://itunes.apple.com/us/podcast/npr-fresh-air-podcast/id214089682?uo=2&uo=2http://itunes.apple.com/us/podcast/npr-car-talk-podcast/id253191823?uo=2&uo=2http://itunes.apple.com/us/podcast/wnycs-radiolab/id152249110?uo=2&uo=2http://itunes.apple.com/us/podcast/despicable-me/id399247154?uo=2&uo=2http://itunes.apple.com/us/podcast/pearls-before-swine-animated/id409382502?uo=2&uo=2http://itunes.apple.com/us/podcast/the-moth-podcast/id275699983?uo=2&uo=2http://itunes.apple.com/us/podcast/apm-a-prairie-home-companions/id215352157?uo=2&uo=2http://itunes.apple.com/us/podcast/harry-potter-years-1-5-podcast/id322144752?uo=2&uo=2http://itunes.apple.com/us/podcast/ace-on-the-house/id414294132?uo=2&uo=2http://itunes.apple.com/us/podcast/takers-takers-featurette-executing/id412910974?uo=2&uo=2http://itunes.apple.com/us/podcast/npr-planet-money-podcast/id290783428?uo=2&uo=2http://itunes.apple.com/us/podcast/stuff-you-missed-in-history/id283605519?uo=2&uo=2http://itunes.apple.com/us/podcast/the-dave-ramsey-show/id77001367?uo=2&uo=2http://itunes.apple.com/us/podcast/book-review/id120315179?uo=2&uo=2http://itunes.apple.com/us/podcast/global-news/id135067274?uo=2&uo=2http://itunes.apple.com/us/podcast/vampires-suck-clips/id405404825?uo=2&uo=2http://itunes.apple.com/us/podcast/npr-science-friday-podcast/id73329284?uo=2&uo=2http://itunes.apple.com/us/podcast/other-guys-crash-and-burn/id407622041?uo=2&uo=2http://itunes.apple.com/us/podcast/back-to-work/id415535037?uo=2&uo=2http://itunes.apple.com/us/podcast/npr-all-songs-considered-podcast/id79687345?uo=2&uo=2http://itunes.apple.com/us/podcast/npr-tiny-desk-concerts-podcast/id362115318?uo=2&uo=2
You can see that it's getting name for all elements before going on to url. I want it to evaluate name and then url, etc., for each element before moving on to the next. What am I doing wrong.
Thanks.
There are several things causing this problem. First, when you use return inside the each-loop you are actually breaking it so it is only iterated once, not 25 times.
Second, you might not notice that it is only run once because when you use //name in the xpath it returns all names.
Perhaps you could do something like this instead:
# Returns top 25 since the url includes limit=25
itunes_top_25 = Nokogiri.XML(open("http://itunes.apple.com/us/rss/toppodcasts/limit=25/xml"))
names_and_urls = itunes_top_25.xpath('//feed/entry').map do |entry|
name = entry.xpath("./name").text
url = entry.xpath("./link/@href").text
category = entry.xpath("./category/@term").text
hosts = entry.xpath("./artist").text
summary = entry.xpath("./summary").text
artwork = entry.xpath("./image[@height='170']").text
[name, url]
end
I changed //name to ./name so that it only returns for the current node. I also changed each to map so that it assigns the variable to an array with all the values returned by the block. And I removed the call to return since it is not necessary.
So this will result in an array of arrays containing names and urls
By calling return
you are stopping your each
loop on the first iteration. Probably you didn't want that. Further, by using the xpath //name
inside your loop, you are starting over at the top of the document and finding every name element in the whole document. Hence, when you find the first <entry>
you then returned an array formed by concatenating the array of every <name>
element in the document with the array of every <url>
element in the document.
You probably want either this:
require 'nokogiri'
require 'open-uri'
# fetch the top 300 podcasts from itunes
# Use XML instead of HTML
itunes_top_300 = Nokogiri::XML(open("http://itunes.apple.com/us/rss/toppodcasts/limit=25/xml"))
itunes_top_300.remove_namespaces!
itunes_top_300.xpath('//entry').each do |entry|
name = entry.xpath("name").text
url = entry.xpath("link/@href").text
puts "#{name}: #{url}"
end
#=> This American Life: http://itunes.apple.com/us/podcast/this-american-life/id201671138?uo=2&uo=2
#=> NPR: Wait Wait... Don't Tell Me! Podcast: http://itunes.apple.com/us/podcast/npr-wait-wait-dont-tell-me/id121493804?uo=2&uo=2
#=> Stuff You Should Know: http://itunes.apple.com/us/podcast/stuff-you-should-know/id278981407?uo=2&uo=2
...or perhaps this:
# Convert XML entries into an array of hashes
parsed = itunes_top_300.xpath('//entry').map do |entry|
name = entry.xpath("name").text
url = entry.xpath("link/@href").text
{ name:name, url:url }
end
require 'pp'
pp parsed[0..3]
#=> [{:name=>"This American Life",
#=> :url=>"http://itunes.apple.com/us/podcast/this-american-life/id201671138?uo=2&uo=2"},
#=> {:name=>"NPR: Wait Wait... Don't Tell Me! Podcast",
#=> :url=>"http://itunes.apple.com/us/podcast/npr-wait-wait-dont-tell-me/id121493804?uo=2&uo=2"},
#=> {:name=>"Stuff You Should Know",
#=> :url=>"http://itunes.apple.com/us/podcast/stuff-you-should-know/id278981407?uo=2&uo=2"},
#=> {:name=>"Freakonomics Radio",
#=> :url=>"http://itunes.apple.com/us/podcast/freakonomics-radio/id354668519?uo=2&uo=2"}]
You declare the variables with the stuff you want then throw it away because you only return name + url
.
instead try return name + url + category + thing1 + thing2
better yet
return [url,category,thing1,thing2]
精彩评论