David Cook
2004-01-02 18:02:23 UTC
Aloha fellow xinetd users...
We've identified a potential problem in the current version of xinetd (2.3.12)
that may be a defect of the code itself, or a possible new form of DOS attack.
We see the following on our console:
Jan 1 17:14:31 puna xinetd[16732]: [ID 385394 daemon.error] Deactivating service ftp due to excessive incoming connections. Restarting in 15 seconds.
Jan 1 17:14:46 puna xinetd[16732]: [ID 385394 daemon.error] Activating service ftp
Jan 1 17:15:41 puna xinetd[16732]: [ID 254256 daemon.error] service pop3, accept: Too many open files (errno = 24)
Jan 1 17:15:45 last message repeated 52347 times
If you look at the times... you can see that a user hitting our FTP caused
it to deactivate at 17:14:31 and then it came back up (as it should) 15
seconds later. I did a netstat and confirmed that indeed, a user from France
had been bombarding our FTP ports - so that's correct.
However, approximatly one minute later you can see that XINETD says that
pop3 accept: has too many open files (errno = 24). Looking at our netstat
we do NOT see any flood of popper hits associated with this.
When we get the ACCEPT error XINETD enters a infinite loop where it pounds
syslogd with that error message (note, it repeated 52,347 times). BTW, we're
running Solaris 8 (5.8). Both syslogd and xinetd start running up huge
CPU loads. Killing xinetd solves the problem and restarting it once load
drops makes things ok.
We noted this happening about 2 months ago - and at that time it appeared
to happen about every 25 days or so. But this week we got it several times,
including times where the server was at a lull (like JAN 1) - ALL of
them were accompanied by someone hitting the FTP port and causing
deactivation/reactivation.
In looking at the xinetd source code - the file 'connection.c' has the
following chunk (starting at line 50):
if( SC_WAITS( scp ) ) {
cp->co_descriptor = SVC_FD( sp );
} else {
cp->co_descriptor = accept( SVC_FD( sp ), &(cp->co_remote_address.sa),
&sin_len ) ;
M_SET( cp->co_flags, COF_NEW_DESCRIPTOR ) ;
}
if ( cp->co_descriptor == -1 )
{
msg( LOG_ERR, func, "service %s, accept: %m", SVC_ID( sp ) ) ;
return( FAILED ) ;
}
Note the 'accept:' in the 'msg' - this is the only place we could find in
the source where that string appears - so we're 99.999% certain this is
the line issuing the error.
Error number 24 is EMFILE (The per-process descriptor table is full) thus
indicating that no more descriptors were available when the 'accept' call
was invoked.
If you happen to follow how this routine is called - it is called through
a series of routines which start in the infinite loop in the main. However,
what happens in this case is the return(FAILED) causes the loop to try again,
in a dead heat for the most part.
We have placed a TEMPORARY fix - to avoid load going high while we're not
monitoring - by chainging the if detecting the error to this:
if ( cp->co_descriptor == -1 )
{
if (errno == 24) {
msg( LOG_ERR, func, "XINETD STOPPED DUE TO NO MORE DESCRIPTORS");
exit(1);
}
msg( LOG_ERR, func, "service %s, accept: %m", SVC_ID( sp ) ) ;
return( FAILED ) ;
}
You can see here that we are looking specifically for error number 24, and
if we see it we issue a message to the log and EXIT - hopefully that SHOULD
stop xinetd and the race condition. Since I put this minor change in last
night we havn't had the occurance yet - so I have yet to prove that it
works (if it does work, I'm intending on placing a SLEEP(10) at the start
of the xinetd main and having that error message also invoke another version
of xinetd before exiting - that should keep it up an running).
Anyone else experiencing this problem and is there any known fix?
Mahalo
We've identified a potential problem in the current version of xinetd (2.3.12)
that may be a defect of the code itself, or a possible new form of DOS attack.
We see the following on our console:
Jan 1 17:14:31 puna xinetd[16732]: [ID 385394 daemon.error] Deactivating service ftp due to excessive incoming connections. Restarting in 15 seconds.
Jan 1 17:14:46 puna xinetd[16732]: [ID 385394 daemon.error] Activating service ftp
Jan 1 17:15:41 puna xinetd[16732]: [ID 254256 daemon.error] service pop3, accept: Too many open files (errno = 24)
Jan 1 17:15:45 last message repeated 52347 times
If you look at the times... you can see that a user hitting our FTP caused
it to deactivate at 17:14:31 and then it came back up (as it should) 15
seconds later. I did a netstat and confirmed that indeed, a user from France
had been bombarding our FTP ports - so that's correct.
However, approximatly one minute later you can see that XINETD says that
pop3 accept: has too many open files (errno = 24). Looking at our netstat
we do NOT see any flood of popper hits associated with this.
When we get the ACCEPT error XINETD enters a infinite loop where it pounds
syslogd with that error message (note, it repeated 52,347 times). BTW, we're
running Solaris 8 (5.8). Both syslogd and xinetd start running up huge
CPU loads. Killing xinetd solves the problem and restarting it once load
drops makes things ok.
We noted this happening about 2 months ago - and at that time it appeared
to happen about every 25 days or so. But this week we got it several times,
including times where the server was at a lull (like JAN 1) - ALL of
them were accompanied by someone hitting the FTP port and causing
deactivation/reactivation.
In looking at the xinetd source code - the file 'connection.c' has the
following chunk (starting at line 50):
if( SC_WAITS( scp ) ) {
cp->co_descriptor = SVC_FD( sp );
} else {
cp->co_descriptor = accept( SVC_FD( sp ), &(cp->co_remote_address.sa),
&sin_len ) ;
M_SET( cp->co_flags, COF_NEW_DESCRIPTOR ) ;
}
if ( cp->co_descriptor == -1 )
{
msg( LOG_ERR, func, "service %s, accept: %m", SVC_ID( sp ) ) ;
return( FAILED ) ;
}
Note the 'accept:' in the 'msg' - this is the only place we could find in
the source where that string appears - so we're 99.999% certain this is
the line issuing the error.
Error number 24 is EMFILE (The per-process descriptor table is full) thus
indicating that no more descriptors were available when the 'accept' call
was invoked.
If you happen to follow how this routine is called - it is called through
a series of routines which start in the infinite loop in the main. However,
what happens in this case is the return(FAILED) causes the loop to try again,
in a dead heat for the most part.
We have placed a TEMPORARY fix - to avoid load going high while we're not
monitoring - by chainging the if detecting the error to this:
if ( cp->co_descriptor == -1 )
{
if (errno == 24) {
msg( LOG_ERR, func, "XINETD STOPPED DUE TO NO MORE DESCRIPTORS");
exit(1);
}
msg( LOG_ERR, func, "service %s, accept: %m", SVC_ID( sp ) ) ;
return( FAILED ) ;
}
You can see here that we are looking specifically for error number 24, and
if we see it we issue a message to the log and EXIT - hopefully that SHOULD
stop xinetd and the race condition. Since I put this minor change in last
night we havn't had the occurance yet - so I have yet to prove that it
works (if it does work, I'm intending on placing a SLEEP(10) at the start
of the xinetd main and having that error message also invoke another version
of xinetd before exiting - that should keep it up an running).
Anyone else experiencing this problem and is there any known fix?
Mahalo