Skip to content

Monitoring Child processes

If the calling process has no children, then waitpid returns −1 and sets errno to ECHILD. If the waitpid function was interrupted by a signal, then it returns −1 and sets errno to EINTR

The wait() system call

  • The wait() system call waits for one of the children of the calling process to terminate and returns the termination status of that child in the buffer pointed to by status.
#include <sys/wait.h>
pid_t wait(int *status);
// Returns process ID of terminated child, or –1 on error

Explain

  • The wait() system call does the following:
    • If no child of the calling process has yet terminated, the call blocks until one of the children terminates. If a child has already terminated by the time of the call, wait() returns immediately.
    • If status is not NULL, information about how the child terminated is returned in the integer to which status points.
    • The kernel adds the process CPU times and resource usage statistics to running totals for all children of this parent process.

Example

wait_demo.c
#include "dbg.h"
#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/wait.h>
#include <time.h>
#include <unistd.h>
#define BUF_SIZE 1000
char *currTime(const char *format);
int main(int argc, char *argv[]) {
int numDead = 0; // Number of children so far waited for
pid_t childPid; // PID of waited for child
int j;
check(argc >= 2 && (strcmp(argv[1], "--help") != 0), "%s sleep-time...",
argv[0]);
/* Disable buffering of stdout */
setbuf(stdout, NULL);
// Create 1 child for each argument
for (j = 1; j < argc; j++) {
switch (fork()) {
case -1:
sentinel("fork");
case 0:
/* Child sleeps for a while then exits
*/
log_info("[%s] child %d started with PID %ld, sleeping %s "
" seconds\n",
currTime("%T"), j, (long)getpid(), argv[j]);
sleep((unsigned int)atoi(argv[j]));
exit(EXIT_SUCCESS);
default:
break;
}
}
for (;;) {
childPid = wait(NULL);
if (childPid == -1) {
// Error
if (errno == ECHILD) {
log_info("No more children - bye!");
exit(EXIT_SUCCESS);
} else {
sentinel("wait");
}
}
numDead++;
log_info("[%s] wait() returned child PID %ld (numDead=%d)\n",
currTime("%T"), (long)childPid, numDead);
}
return 0;
error:
return -1;
}

The waitpid() system call

The wait() system call has a number of limitations, which waitpid() was designed to address:

  • If a parent process has created multiple children, it is not possible to wait() for the completion of a specific child; we can only wait for the next child that terminate
  • If no child has yet terminated, wait() always blocks. Sometimes, it would be preferable to perform a nonblocking wait so that if no child has yet terminated, we obtain an immediate indication of this fact.
  • Using wait(), we can find out only about children that have terminated. It is not possible to be notified when a child is stopped by a signal (such as SIGSTOP or SIGTTIN) or when a stopped child is resumed by delivery of a SIGCONT signal.
#include <sys/wait.h>
pid_t waitpid(pid_t pid, int *status, int options);
// Returns process ID of child, 0 (see text), or –1 on error

Explain

  • The return value and status arguments are the same as for `wait().

  • The pid arg enables the selection of the child to be waited for, as follows:

    • If pid > 0, wait for the child whose process ID equals pid.
    • If pid == 0, wait for any child in the same process group as the caller (parent).
    • If pid < -1, wait for any child whose process group identifier == abs(pid).
    • If pid == -1, wait for any child (wait(int *status) <==> waitpid(-1, int *status, 0)).
  • The options arg is a bit mask that can include (OR) zero or more of the following flags:

FlagDescription
WUNTRACEDIn addition to returning information about terminated children, also return information when a child is stopped by a signal.
WCONTINUED (since Linux 2.6.10)Also return status information about stopped children that have been resumed by delivery of a SIGCONT signal.
WNOHANGIf no child specified by pid has yet changed state, then return immediately, instead of blocking (i.e., perform a “poll”). In this case, the return value of waitpid() is 0. If the calling process has no children that match the specification in pid, waitpid() fails with the error ECHILD.

Example

#include "dbg.h"
#include <errno.h>
#include <stdlib.h>
#include <sys/wait.h>
#include <termios.h>
#include <unistd.h>
#define N 5
int main(int argc, char *argv[]) {
int status, i;
pid_t pid;
for (i = 0; i < N; i++)
if ((pid = fork()) == 0) /* Child */
exit(100 + i);
while ((pid = waitpid(-1, &status, 0)) > 0) {
/* If the child exited normally
*/
if (WIFEXITED(status))
log_info("Child %d terminated normally with exit status = %d", pid,
WEXITSTATUS(status));
else
sentinel("Child %d terminated abnormally", pid);
}
// The only normal termination is if there are no more children
check(errno == ECHILD, "waitpid error");
exit(0);
error:
return -1;
}

The status value

The status value returned by wait() and waitpid() allows us to distinguish the following events for the child:

  • The child terminated by calling _exit() (or exit()), specifying an integer exit status.
  • The child was terminated by the delivery of an unhandled signal.
  • The child was stopped by a signal, and waitpid() was called with the WUNTRACED flag.
  • The child was resumed by a SIGCONT signal, and waitpid() was called with the WCONTINUED flag.

We use the term wait status to encompass all of the above cases. The designation termination status is used to refer to the first two cases (variable $? in shell).

Although defined as an int, only the bottom 2 bytes of the value pointed to by status are actually used. The way in which these 2 bytes are filled depends on which of the above events occurred for the child, as depicted below:

alt text

The <sys/wait.h> header file defines a standard set of macros that can be used to dissect a wait status value. When applied to a status value returned by wait() or waitpid(), only one of the macros in the list below will return true. Additional macros are provided to further dissect the status value, as noted in the list.

MacrosDescription
WIFEXITED(status)This macro returns true if the child process exited normally.
WEXITSTATUS(status)Only defined if WIFEXITED(status) returned true.
Returns the exit status of a normally terminated child.
WIFSIGNALED(status)This macro returns true if the child process was killed by a signal.
WTERMSIG(status)[Only defined if WIFSIGNALED() returned true]
Returns the number of the signal that caused the child process to terminate
WCOREDUMP(status)[Only defined if WIFSIGNALED()returned true]
Returns true if the child process produced a core dump file
WIFSTOPPED(status)This macro returns true if the child process was stopped by a signal.
WSTOPSIG(status)[Only defined if WIFSTOPPED returned true]
Returns the number of the signal that stopped the process
WIFCONTINUED(status)This macro returns true if the child was resumed by delivery of SIGCONT.