Thread: BUG: possible busy loop when connection is closed while trying to establish SSL connection
BUG: possible busy loop when connection is closed while trying to establish SSL connection
From
Hannu Krosing
Date:
We were bitten by the following bug a few times, when our server tried to reestablish connections under bad network conditions: if connection is closed while trying to get response to SSL setup packet (i.e. conn->status is CONNECTION_SSL_STARTUP), we get a busy loop, as line 1035 in 8.0.0.beta2: if (pqWaitTimed(1, 0, conn, finish_time) { tells that there is data to read (returns 0) while actually it is error (POLLERR & POLLHUP) and not POLLIN returned from poll() and nread = recv(conn->sock, &SSLok, 1, 0); on line 1449 sets nread to 0, as there is in fact nothing to read from a closed socket after that the check on line 1462: if (nread == 0) /* caller failed to wait for data */ return PGRES_POLLING_READING; resumes the busy loop in all other places recv()'ing is done throught pqsecure_read, which checks for such condition. I'm not familiar enough with code to suggest a proper fix (and I quickfix our app by disabling SSL (PGSSLMODE=disable) - the default was prefer ;), but this may come back to haunt others -------------- Hannu
Re: BUG: possible busy loop when connection is closed while trying to establish SSL connection
From
Tom Lane
Date:
Hannu Krosing <hannu@tm.ee> writes: > We were bitten by the following bug a few times, when our server tried > to reestablish connections under bad network conditions: > > if connection is closed while trying to get response to SSL setup packet > (i.e. conn->status is CONNECTION_SSL_STARTUP), we get a busy loop, as > line 1035 in 8.0.0.beta2: > > if (pqWaitTimed(1, 0, conn, finish_time) { > > tells that there is data to read (returns 0) while actually it is error > (POLLERR & POLLHUP) and not POLLIN returned from poll() and This is intentional: the idea is that we should go ahead and do the read (or write), which will detect the error condition on the socket. poll() in itself doesn't give enough information to determine what the error condition is, so it's not appropriate to fail here. > after that the check on line 1462: > > if (nread == 0) > /* caller failed to wait for data */ > return PGRES_POLLING_READING; > > resumes the busy loop This seems to me to be the bug. pqReadData jumps through hoops to determine whether a zero-length read means EOF or not, and I think we need to expend some effort to determine that here too. One possibility is to forget the direct call to recv() and use pqReadData --- since conn->ssl isn't set yet, and we aren't expecting the server to send more than one byte, this should in theory be safe. Comments? regards, tom lane
On N, 2004-09-23 at 06:41, Tom Lane wrote: > Hannu Krosing <hannu@tm.ee> writes: > > We were bitten by the following bug a few times, when our server tried > > to reestablish connections under bad network conditions: > > > > if connection is closed while trying to get response to SSL setup packet > > (i.e. conn->status is CONNECTION_SSL_STARTUP), we get a busy loop, as > > line 1035 in 8.0.0.beta2: > > > > if (pqWaitTimed(1, 0, conn, finish_time) { > > > > tells that there is data to read (returns 0) while actually it is error > > (POLLERR & POLLHUP) and not POLLIN returned from poll() and at least on linux it does, we got the following trace: poll([{fd=11, events=POLLIN|POLLERR, revents=POLLIN|POLLERR|POLLHUP}], 1, -1) = 1 recv(11, "", 1, 0) = 0 poll([{fd=11, events=POLLIN|POLLERR, revents=POLLIN|POLLERR|POLLHUP}], 1, -1) = 1 recv(11, "", 1, 0) = 0 poll([{fd=11, events=POLLIN|POLLERR, revents=POLLIN|POLLERR|POLLHUP}], 1, -1) = 1 recv(11, "", 1, 0) = 0 which seems to say that poll came back on POLLHUP, and as there is just one fd, it must mean that this one fd is closed. But this may be non-portable > This is intentional: the idea is that we should go ahead and do the read > (or write), which will detect the error condition on the socket. poll() > in itself doesn't give enough information to determine what the error > condition is, so it's not appropriate to fail here. > > > after that the check on line 1462: > > > > if (nread == 0) > > /* caller failed to wait for data */ > > return PGRES_POLLING_READING; > > > > resumes the busy loop > > This seems to me to be the bug. pqReadData jumps through hoops to > determine whether a zero-length read means EOF or not, and I think we > need to expend some effort to determine that here too. > > One possibility is to forget the direct call to recv() and use > pqReadData --- since conn->ssl isn't set yet, and we aren't expecting > the server to send more than one byte, this should in theory be safe. I was scared by the comment before recv(...,1,0) which said we must be careful not to read more than 1 byte Is it impossible to not accidentally get more than one and screw up SSL handshake ? ------------- Hannu
Re: BUG: possible busy loop when connection is closed while trying to establish SSL connection
From
Tom Lane
Date:
Hannu Krosing <hannu@tm.ee> writes: >> One possibility is to forget the direct call to recv() and use >> pqReadData --- since conn->ssl isn't set yet, and we aren't expecting >> the server to send more than one byte, this should in theory be safe. > I was scared by the comment before recv(...,1,0) which said we must be > careful not to read more than 1 byte When I wrote that, I was trying to assume as little as possible about the SSL protocol. The only way there could be a problem is if the server is first to send during the SSL negotiation handshake; which seems odd but not impossible. Anyone know for sure? regards, tom lane
Dear Tom, > When I wrote that, I was trying to assume as little as possible about > the SSL protocol. The only way there could be a problem is if the > server is first to send during the SSL negotiation handshake; which > seems odd but not impossible. Anyone know for sure? As for the RFC, the client is the first to speak. In the server handcheck response (server hello), the server must tell what version of the protocol is to be used (best that it knows of, ceiled by the client version), whether it accepts the previous session of the client and what algorithms are chosen among those suggested by the client. ISTM that this cannot be sent before the client hello message is received... or the server does not really implement SSL. Now if you connect to some other server with some other protocol, that is another issue... Also, I do not know how the postgresql protocol interacts with SSL... I guess the server waits for the first packet to decided whether it is a SSL connection or a non-SSL connection? My 0.02 EUR. Have a nice day, -- Fabien Coelho - coelho@cri.ensmp.fr
Re: BUG: possible busy loop when connection is closed while trying to establish SSL connection
From
Tom Lane
Date:
Fabien COELHO <coelho@cri.ensmp.fr> writes: > Now if you connect to some other server with some other protocol, that is > another issue... But the code in question is only for SSL connection to PG, so that's a red herring I think. > Also, I do not know how the postgresql protocol interacts > with SSL... I guess the server waits for the first packet to decided > whether it is a SSL connection or a non-SSL connection? It'sClient sends SSL request message Server sends back 1 byte indicating if it can do SSL or not << SSL startup protocol happens here >> Client sends normal PG connection request message Authentication exchange proceeds as usual If the server were first to send during the SSL startup protocol, then it's possible that more than one byte would be waiting for the client when it reads the server response. We'd have no easy way to "push back" those bytes and allow them to be re-read by the client-side SSL library. So that was the danger I was concerned about. If the client is first to send during the SSL startup protocol, then there's no problem: there can only be one byte waiting for us. regards, tom lane
> If the client is first to send during the SSL startup protocol, then > there's no problem: there can only be one byte waiting for us. So it looks good from the SSL perspective. -- Fabien Coelho - coelho@cri.ensmp.fr