I have a website that requires using Nokogiri on many different websites to extract data. This process is ran as a background job using the delayed开发者_运维问答_job gem. However it takes around 3-4 seconds per page to run because it has to pause and wait for other websites to respond. I am currently just running them by basically saying
Websites.all.each do |website|
# screen scrape
end
I would like to execute them in batches rather than one each so that I dont have to wait for a server response from every site (can take up to 20 seconds on occassion).
What would be the best ruby or rails way to do this?
Thanks for your help in advance.
You might want to check out Typhoeus which enables you to make parallel http requests.
I found a short blawg post here about using it with Nokogiri, but I haven't tried this myself.
Wrapped in a DJ, this should do the trick with little client-side latency.
You need to use delayed job. Check out this Railscasts.
Keep in mind most hosts charge for this type of thing.
You can also use the spawn plugin if you don't care about managing threads but it is much much easier!!!
This is literally all you need to do:
rails plugin/install https://github.com/tra/spawn.git
- Then in your controller or model add the method
For example:
spawn do
#execute your code here :)
end
http://railscasts.com/episodes/171-delayed-job
https://github.com/tra/spawn
I'm using EventMachine to do something similar to this for a current project. There is a terrific plugin called em-http-request that allows you to make mutliple HTTP requests in parallel, as well as providing options for synchronising the responses.
From the em-http-request github docs:
EventMachine.run {
http1 = EventMachine::HttpRequest.new('http://google.com/').get
http2 = EventMachine::HttpRequest.new('http://yahoo.com/').get
http1.callback { }
http2.callback { }
end
So in your case, you could have
callbacks = []
Websites.all.each do |website|
callbacks << EventMachine::HttpRequest.new(website.url).get
end
callbacks.each do |http|
http.callback { }
end
Run your rails application with the thin webserver in order to get a functioning EventMachine loop:
bundle exec rails server thin
You'll also need the eventmachine and em-http-request gems. Good luck!
精彩评论