I have two scripts which use Mechanize to fetch a Google index page. I assumed EventMachine will be faster than a Ruby thread, but it's not.
EventMachine code costs: "0.24s user 0.08s system 2% cpu 12.682 total"
Ruby Thread code costs: "0.22s user 0.08s system 5% cpu 5.167 total "
Am I using EventMachine in the wrong way?
EventMachine:
require 'rubygems'
require 'mechanize'
require 'eventmachine'
trap("INT") {EM.stop}
EM.run do
num = 0
operation = proc {
agent = Mechanize.new
sleep 1
agent.get(开发者_StackOverflow"http://google.com").body.to_s.size
}
callback = proc { |result|
sleep 1
puts result
num+=1
EM.stop if num == 9
}
10.times do
EventMachine.defer operation, callback
end
end
Ruby Thread:
require 'rubygems'
require 'mechanize'
threads = []
10.times do
threads << Thread.new do
agent = Mechanize.new
sleep 1
puts agent.get("http://google.com").body.to_s.size
sleep 1
end
end
threads.each do |aThread|
aThread.join
end
All of the answers in this thread are missing one key point: your callbacks are being run inside the reactor thread instead of in a separate deferred thread. Running Mechanize requests in a defer
call is the right way to keep from blocking the loop, but you have to be careful that your callback does not also block the loop.
When you run EM.defer operation, callback
, the operation is run inside a Ruby-spawned thread, which does the work, and then the callback is issued inside the main loop. Therefore, the sleep 1
in operation
runs in parallel, but the callback runs serially. This explains the near 9-second difference in run time.
Here's a simplified version of the code you are running.
EM.run {
times = 0
work = proc { sleep 1 }
callback = proc {
sleep 1
EM.stop if (times += 1) >= 10
}
10.times { EM.defer work, callback }
}
This takes about 12 seconds, which is 1 second for the parallel sleeps, 10 seconds for the serial sleeps, and 1 second for overhead.
To run the callback code in parallel, you have to spawn new threads for it using a proxy callback that uses EM.defer
like so:
EM.run {
times = 0
work = proc { sleep 1 }
callback = proc {
sleep 1
EM.stop if (times += 1) >= 10
}
proxy_callback = proc { EM.defer callback }
10.times { EM.defer work, proxy_callback }
}
However, you may run into issues with this if your callback is then supposed to execute code within the event loop, because it is run inside a separate, deferred thread. If this happens, move the problem code into the callback of the proxy_callback proc.
EM.run {
times = 0
work = proc { sleep 1 }
callback = proc {
sleep 1
EM.stop_event_loop if (times += 1) >= 5
}
proxy_callback = proc { EM.defer callback, proc { "do_eventmachine_stuff" } }
10.times { EM.defer work, proxy_callback }
}
This version ran in about 3 seconds, which accounts for 1 second of sleeping for operation in parallel, 1 second of sleeping for callback in parallel and 1 second for overhead.
Yep, you're using it wrong. EventMachine works by making asynchronous IO calls that return immediately and notify the "reactor" (the event loop started by EM.run) when they are completed. You have two blocking calls that defeat the purpose of the system, sleep and Mechanize.get. You have to use special asynchronous/non-blocking libraries to derive any value from EventMachine.
You should use something like em-http-request http://github.com/igrigorik/em-http-request
EventMachine "defer" actually spawns Ruby threads from a threadpool it manages to handle your request. Yes, EventMachine is designed for non-blocking IO operations, but the defer command is an exception - it's designed to allow you to do long running operations without blocking the reactor.
So, it's going to be a little slower then naked threads, because really it's just launching threads with the overhead of EventMachine's threadpool manager.
You can read more about defer here: http://eventmachine.rubyforge.org/EventMachine.html#M000486
That said, fetching pages is a great use of EventMachine, but as other posters have said, you need to use a non-blocking IO library, and then use next_tick or similar to start your tasks, rather then defer, which breaks your task out of the reactor loop.
精彩评论