I am designing a monitor process. The job of the monitor process is to monitor a few set of configured processes. When t开发者_开发百科he monitor process detects that a process has gone down, it needs to restart the process.
I am developing the code for my linux system. Here is how I developed a small prototype - Fed the details(path, arguments) about the various processes that need to be monitored. - The monitor process did the following: 1. Installed a signal handler for SIGCHLD 2. A fork and execv to start the process to be monitored. Store the pid of the child processes. 3. When a child went down, the parent recevies a SIGCHLD 4. The signal handler will now be called. The handler will run a for loop on the list of pids stored earlier. For each pid, it will check the /proc filesystem for existence of a directory corresponding to the pid. If the directory doesn't exist, the process is restarted.
Now, my question is this - Is the above method (to check the /proc filesystem) a standard or recommended mechanism of checking if a process is running or should I do something like creating a pipe for the ps command and looping through the output of ps ? - Is there a better way of achieving my requirement?
Regards.
You should not be checking /proc
to determine which process has exited - it's possible for another, unrelated, process to start in the meantime and be coincidentally assigned the same PID.
Instead, within your SIGCHLD
handler you should use the waitpid()
system call, in a loop such as:
int status;
pid_t child;
while ((child = waitpid(-1, &status, WNOHANG)) > 0)
{
/* Process with PID 'child' has exited, handle it */
}
(The loop is needed because multiple child processes may exit within a short period of time, but only one SIGCHLD may result).
Let's see if I've understood you. You have a list of children and you are running a loop on /proc on your SIGCLD handler to see which children are still alive, isn't it?
That's not very usual,... and it's a but ugly,
What you usually do is run a while((pid = waitpid(-1, &status, WNOHANG)))
loop on your SIGCLD handler, and use the returned pid and the Wxxx macros to maintain your children list up to date.
Notice that wait()
and waitpid()
are async-signal-safe. The functions you are calling to examine /proc
are probably not.
Look into supervisord. It works great.
You can easily tell if a process is alive by issuing a kill()
system call to its pid. If the child is not alive, kill()
will not succeed.
Also, calling waitpid()
with the WNOHANG
option will return zero immediately if the process is still alive.
IMHO, reading proc files or piping to ps is a nasty way to do it.
精彩评论