I was working with a TCP server and something ultra weird came up.
When I connect with one client, everythings goes fine.
When two or more client connects, traffic is still fine. But when any of the clients DISCONNECTS, the server jams right after the reaper call. It just sits there waiting.I discovered this when I disconnected 1 client out of 2 & tried to reconnect. The reconnecting client does not display any error messages at all, it runs smoothly. Packages CAN be sent to the server, but it piles up inside the server.
The server on the other hand hangs there, waiting for one specific client to disconnect. If that client disconnects, the server will resume function, executing all requests that were piled up inside it.
Below is the barebone code I used for the server structure.
This code also demonstrates the problem stated above. Can anyone please, please, please point out where the error was made?void reaper(int sig)
{
int status;
while (waitpid(-1, &status, WNOHANG) >= 0)
/* empty */;
}
int errexit(const char *format, ...)
{
va_list args;
va_start(args, format);
vfprintf(stderr, format, args);
va_end(args);
exit(1);
}
int errno;
unsigned short portbase = 0; /* port base, for non-root servers */
int passivesock(const char *service, const char *transport, int qlen)
{
struct servent *pse; /* pointer to service information entry */
struct protoent *ppe; /* pointer to protocol information entry*/
struct sockaddr_in sin; /* an Internet endpoint address */
int s, type; /* socket descriptor and socket type */
memset(&sin, 0, sizeof(sin));
sin.sin_family = AF_INET;
sin.sin_addr.s_addr = INADDR_ANY;
/* Map service name to port number */
if ( pse = getservbyname(service, transport) )
sin.sin_port = htons(ntohs((unsigned short)pse->s_port)
+ portbase);
else if ((sin.sin_port=htons((unsigned short)atoi(service))) == 0)
errexit("can't get \"%s\" service entry\n", service);
/* Map protocol name to protocol number */
if ( (ppe = getprotobyname(transport)) == 0)
errexit("can't get \"%s\" protocol entry\n", transport);
/* Use protocol to choose a socket type */
if (strcmp(transport, "udp") == 0开发者_JS百科)
type = SOCK_DGRAM;
else
type = SOCK_STREAM;
/* Allocate a socket */
s = socket(PF_INET, type, ppe->p_proto);
if (s < 0)
errexit("can't create socket: %s\n", strerror(s));
/* Bind the socket */
if (errno=bind(s, (struct sockaddr *)&sin, sizeof(sin)) < 0)
errexit("can't bind to %s port: %s\n", service,
strerror(errno));
if (type == SOCK_STREAM && listen(s, qlen) < 0)
errexit("can't listen on %s port: %s\n", service,
strerror(type));
return s;
}
int passiveTCP(const char *service, int qlen)
{
return passivesock(service, "tcp", qlen);
}
#define QLEN 32 /* maximum connection queue length */
#define BUFSIZE 4096
int TCPechod(int fd);
int main(int argc, char *argv[])
{
char *service; /* service name or port number */
struct sockaddr_in fsin; /* the address of a client */
unsigned int alen; /* length of client's address */
int msock; /* master server socket */
int ssock; /* slave server socket */
if (argc !=2)
errexit("usage: %s port\n", argv[0]);
service = argv[1];
msock = passiveTCP(service, QLEN);
(void) signal(SIGCHLD, reaper);
while (1) {
alen = sizeof(fsin);
ssock = accept(msock, (struct sockaddr *)&fsin, &alen);
if (ssock < 0) {
if (errno == EINTR)
continue;
errexit("accept: %s\n", strerror(ssock));
}
printf("Accept connection %d from %s:%d\n", ssock, inet_ntoa(fsin.sin_addr), (int)ntohs(fsin.sin_port));
switch (fork()){
case 0:
(void) close(msock);
TCPechod(ssock);
close(ssock);
exit(0);
default:
close(ssock);
break;
case -1:
errexit("fork: %s\n", strerror(errno));
}
}
}
int TCPechod(int fd)
{
char buf[BUFSIZE];
int cc;
while (cc = read(fd, buf, sizeof(buf))) {
if (cc < 0)
errexit("echo read: %s\n", strerror(cc));
if (errno=write(fd, buf, cc) < 0)
errexit("echo write: %s\n", strerror(errno));
}
return 0;
}
Any heads up would greatly be appreciated.
I thank you in advance.The problem is how you are calling waitpid, because you are only leave the while when an error occurred (waitpid return < 0 if an error occurred). When you call waitpid with the WNOHANG flag, it will returns 0 if there is not any child process terminated (really change state: stopped, resumed or terminated). Try this correction:
void reaper(int sig)
{
int status;
pid_t pid;
while ((pid = waitpid(-1, &status, WNOHANG)) > 0)
printf("Proces PID: %d Hash Finished With Status: %d", pid, status);
if (0 == pid) printf("No More Process Waiting");
if (pid < 0) printf("An Error Ocurred");
}
If you want to use wait the reaper function must be something like:
void reaper(int sig)
{
int status;
pid_t pid;
pid = wait(&status); // Wait suspend the execution of the current process.
if (pid > 0) printf("Proces PID: %d Hash Finished With Status: %d", pid, status);
if (pid < 0) printf("An Error Ocurred");
}
For additional information about wait(2) go to: http://linux.die.net/man/2/wait
I ran into this problem as well.
The "reaper" function with the >=0
test is in examples all over the place, but this may end up being an endless loop since even once it cleans up the child it'll keep looping not until there are no more, but until it gets an error of some kind.
There are Perl versions of this code out there that is usually "fixed" by using >0
instead of >=0
, but you may want to use the logic as shown here where you explicitly test for the cases of interest.
精彩评论