EventMachine: What is the maximum of parallel HTTP requests EM can handle?_问答_开发者

EventMachine: What is the maximum of parallel HTTP requests EM can handle?

开发者 https://www.devze.com 2023-02-17 12:52 出处：网络

I\'m building a distributed web-crawler and trying to get maximum out of resources of each single machine. I run parsing functions in EventMachine through Iterator and use em-http-request to make asyn

I'm building a distributed web-crawler and trying to get maximum out of resources of each single machine. I run parsing functions in EventMachine through Iterator and use em-http-request to make asynchr开发者_如何学Conous HTTP requests. For now I have 100 iterations that run at the same time and it seems that I can't pass over this level. If I increase a number of iteration it doesn't affect the speed of crawling. However, I get only 10-15% cpu load and 20-30% of network load, so there's plenty of room to crawl faster.

I'm using Ruby 1.9.2. Is there any way to improve the code to use resources effectively or maybe I'm even doing it wrong?

def start_job_crawl     
  @redis.lpop @queue do |link|
    if link.nil?                
      EventMachine::add_timer( 1 ){ start_job_crawl() }
    else   
      #parsing link, using asynchronous http request, 
      #doing something with the content                         
      parse(link)
    end          
  end        
end

#main reactor loop   

EM.run {   
 EM.kqueue   

 @redis = EM::Protocols::Redis.connect(:host => "127.0.0.1")
 @redis.errback do |code|
  puts "Redis error: #{code}"
 end

 #100 parallel 'threads'. Want to increase this     

  EM::Iterator.new(0..99, 100).each do |num, iter| 
      start_job_crawl()    
  end
}

if you are using select()(which is the default for EM), the most is 1024 because select() limited to 1024 file descriptors.

However it seems like you are using kqueue, so it should be able to handle much more than 1024 file descriptors at once.

which is the value of your EM.threadpool_size ?
try enlarging it, I suspect the limit is not in the kqueue but in the pool handling the requests...