Looking at this site here: http://www.grammy.com/nominees/search?artist=&title=&year=1958&genre=All
I can view all winners by year -- how can I scrape just the names of the winners on each page (for each year) and get them 开发者_如何学Pythonin a simple database?
thanks!
This will get you the actual names, cleaning them up a little bit and inserting them into a DB is an exercise left to you:
require 'rubygems'
require 'hpricot'
require 'open-uri'
html = open("http://www.grammy.com/nominees/search?artist=&title=&year=1958&genre=All")
doc = Hpricot(html)
doc.search("td.views-field-field-nominee-extended-value").each do |winner|
puts winner.inner_html.strip
end
精彩评论