Here's the code for the scraper
class Scrape
def perform
url = "# a long url"
agent = Mechanize.new
agent.get(url)
while(agent.page.link_with(:text => "Next Page \u00BB")) do
agent.page.search(".content").each do |item|
puts "."
House.create!({
# attributes...
})
end
agent.page.link_with(:text => "Next Page \u00BB").click
end
end
end
On my local environment I c开发者_JAVA百科an run it in the rails console just by typing
Scrape.new.delay.perform # to queue the job
rake jobs:work
and it works perfectly.
However running the analogous (with a worker running instead of rake jobs:work) in the Heroku console doesn't seem to do anything. I tried logging some lines in the Heroku log and I can get the url variable to log (so the method is at least getting called) but the "." which is there to show each time we run the while loop never appears and no Houses are created in the database.
Anyone any ideas what might be wrong?
Solved this problem myself, pretty obscure bug though. I was using ruby 1.9.2 in my local environment but I had the app deployed on a ruby 1.8.7 stack.
The important difference being the change in character encoding between the two ruby versions which meant that Mechanize couldn't find a link with the unicode encoded character "\u00BB" and thus didn't do any scraping.
精彩评论