I opened up a question for this problem and did not get a thorough enough answer to solve the issue (most likely due to a lack of rigor in explaining my issues which is what I am attempting to correct): Zombie process in python multiprocessing daemon
I am trying to implement a python daemon that uses a pool of workers to executes commands using Popen
. I have borrowed the basic daemon from http://www.jejik.com/articles/2007/02/a_simple_unix_linux_daemon_in_python/
I have only changed the init
, daemonize
(or equally the start
) and stop
methods. Here are the changes to the init
method:
def __init__(self, pidfile):
#, stdin='/dev/null', stdout='STDOUT', stderr='STDOUT'):
#self.stdin = stdin
#self.stdout = stdout
#self.stderr = stderr
self.pidfile = pidfile
self.pool = Pool(processes=4)
I am not setting stdin, stdout and stderr so that I can debug the code with print statements. Also, I have tried moving this pool around to a few places but this is the only place that does not produce exceptions.
Here are the changes to the daemonize
method:
def daemonize(self):
...
# redirect standard file descriptors
#sys.stdout.flush()
#sys.stderr.flush()
#si = open(self.stdin, 'r')
#so = open(self.stdout, 'a+')
#se = open(self.stderr, 'a+', 0)
#os.dup2(si.fileno(), sys.stdin.fileno())
#os.dup2(so.fileno(), sys.stdout.fileno())
#os.dup2(se.fileno(), sys.stderr.fileno())
print self.pool
...
Same thing, I am not redirecting io so that I can debug. The print here is used so that I can check the pools location.
And the stop
method changes:
def stop(self):
...
# Try killing the daemon process
try:
print self.pool
print "closing pool"
self.pool.close()
print "joining pool"
self.pool.join()
print "set pool to None"
self.pool = None
while 1:
print "kill process"
os.kill(pid, SIGTERM)
...
Here the idea is that I not only need to kill the process but also clean up the pool. The self.pool = None
is just a random attempt to solve the issues which didn't work. At first I thought this was a problem with zombie children which was occurring when I had the self.pool.close()
and self.pool.join()
inside the while loop with the os.kill(pid, SIGTERM)
. This is before I decided to start looking at the pool location via the print self.pool
. After doing this, I believe the pools are not the same when the daemon starts and when it stops. Here is some output:
me@pc:~/pyCode/jobQueue$ sudo ./jobQueue.py start
<multiprocessing.pool.Pool object at 0x1c543d0>
me@pc:~/pyCode/jobQueue$ sudo ./jobQueue.py stop
<multiprocessing.pool.Pool object at 0x1fb7450>
closing pool
joining pool
set pool to None
kill process
kill process
... [ stuck in infinite loop]
The different locations of the objects suggest to me that they are not the same pool and that one of them is probably the zombie?
After CTRL+C
, here is what I get from ps aux|grep jobQueue
:
root 21161 0.0 0.0 50384 5220 ? Ss 22:59 0:00 /usr/bin/python ./jobQueue.py start
root 21162 0.0 0.0 0 0 ? Z 开发者_JS百科 22:59 0:00 [jobQueue.py] <defunct>
me 21320 0.0 0.0 7624 940 pts/0 S+ 23:00 0:00 grep --color=auto jobQueue
I have tried moving the self.pool = Pool(processes=4)
to a number of different places. If it is moved to the start()' or
daemonize()methods,
print self.pool` will throw an exception saying that it is NoneType. In addition, the location seems to change the number of zombie process that will pop up.
Currently, I have not added the functionality to run anything via the workers. My problem seems completely related to setting up the pool of workers correctly. I would appreciate any information that leads to solving this issue or advice about creating a daemon service that uses a pool of workers to execute a series of commands using Popen
. Since I haven't gotten that far, I do not know what challenges I face ahead. I am thinking I might just need to write my own pool but if there is a nice trick to make the pool work here, it would be amazing.
The solution is to put the self.pool = Pool(process=4)
as the last line of the daemonize
method. Otherwise the pool ends up getting lost somewhere (perhaps in the fork
s). Then the pool can be access inside the run
method which is overloaded by the application you wish to daemonize. However, the pool cannot be accessed in the stop method and to do so would lead to NoneType exceptions. I believe there is a more elegant solution but this works and it is all I have for now. If I want the stop
to fail when the pool is still in action, I will have to add additional functionality to run
and some form of message but I am not currently concerned with this.
精彩评论