I have a god/resque setup that spans a few worker servers. Every so often, the workers get jammed up by long polling connections and won't time out correctly. We have tried coding around it (but regardless of why it doesn't work), the keep-alive packets being sent down the wire won't let us time it out easily.
I would like certain workers (which I already have segmented out in their own watch blocks) to not be allowed to run for longer than a certain amount of time. In pesudocode, I am looking for a watch condition like the following (i.e. restart that worker if it takes longer than 60 sec to complete the task):
w.transition(:up, :restart) do |on|
on.condition(:process_timer) do {|c| c.greater_than = 60.seconds}
end
Any thoughts or pointe开发者_开发百科rs on how to accomplish this would be greatly appreciated.
require 'timeout'
Timeout::timeout(60) do
...
end
Although you have an answer I'll drop this here since I already made it:
class TimedThread
def initialize(limit, &block)
@thread = Thread.new{ block.call }
@start = Time.now
Thread.new do
while @thread.alive?
if Time.now - @start > limit
@thread.kill
puts "Thread killed"
end
end
end.join
end
end
[1, 2, 3].each_with_index do |secs, i|
TimedThread.new(2.5){ sleep secs ; puts "Finished with #{i+1}" }
end
As it turns out, there is an example of how to do this in some sample resque files. It's not exactly what I was looking for since it doesn't add an on.condition(:foo)
, but it is a viable solution:
# This will ride alongside god and kill any rogue stale worker
# processes. Their sacrifice is for the greater good.
WORKER_TIMEOUT = 60 * 10 # 10 minutes
Thread.new do
loop do
begin
`ps -e -o pid,command | grep [r]esque`.split("\n").each do |line|
parts = line.split(' ')
next if parts[-2] != "at"
started = parts[-1].to_i
elapsed = Time.now - Time.at(started)
if elapsed >= WORKER_TIMEOUT
::Process.kill('USR1', parts[0].to_i)
end
end
rescue
# don't die because of stupid exceptions
nil
end
# Sleep so we don't run too frequently
sleep 30
end
end
Maybe take a look at resque-restriction? It doesn't appear to be under active maintenance but might do what you need.
精彩评论