Re: BUG: possible busy loop when connection is closed - Mailing list pgsql-hackers

From Hannu Krosing
Subject Re: BUG: possible busy loop when connection is closed
Date
Msg-id 1095927167.3552.11.camel@fuji.krosing.net
Whole thread Raw
In response to Re: BUG: possible busy loop when connection is closed while trying to establish SSL connection  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: BUG: possible busy loop when connection is closed while trying to establish SSL connection  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On N, 2004-09-23 at 06:41, Tom Lane wrote:
> Hannu Krosing <hannu@tm.ee> writes:
> > We were bitten by the following bug a few times, when our server tried
> > to reestablish connections under bad network conditions:
> >
> > if connection is closed while trying to get response to SSL setup packet
> > (i.e. conn->status is CONNECTION_SSL_STARTUP), we get a busy loop, as
> > line 1035 in 8.0.0.beta2:
> >
> >     if (pqWaitTimed(1, 0, conn, finish_time)  {
> >
> > tells that there is data to read (returns 0) while actually it is error 
> > (POLLERR & POLLHUP) and not POLLIN returned from  poll() and 

at least on linux it does, we got the following trace:
poll([{fd=11, events=POLLIN|POLLERR, revents=POLLIN|POLLERR|POLLHUP}],
1, -1) = 1 
recv(11, "", 1, 0)   = 0 
poll([{fd=11, events=POLLIN|POLLERR, revents=POLLIN|POLLERR|POLLHUP}],
1, -1) = 1 
recv(11, "", 1, 0)   = 0 
poll([{fd=11, events=POLLIN|POLLERR, revents=POLLIN|POLLERR|POLLHUP}],
1, -1) = 1 
recv(11, "", 1, 0)   = 0 
which seems to say that poll came back on POLLHUP, and as there is just
one fd, it must mean that this one fd is closed. But this may be
non-portable

> This is intentional: the idea is that we should go ahead and do the read
> (or write), which will detect the error condition on the socket.  poll()
> in itself doesn't give enough information to determine what the error
> condition is, so it's not appropriate to fail here.
> 
> > after that the check on line 1462:
> >
> >     if (nread == 0)
> >         /* caller failed to wait for data */
> >         return PGRES_POLLING_READING;
> >
> > resumes the busy loop
> 
> This seems to me to be the bug.  pqReadData jumps through hoops to
> determine whether a zero-length read means EOF or not, and I think we
> need to expend some effort to determine that here too.
> 
> One possibility is to forget the direct call to recv() and use
> pqReadData --- since conn->ssl isn't set yet, and we aren't expecting
> the server to send more than one byte, this should in theory be safe.

I was scared by the comment before recv(...,1,0) which said we must be
careful not to read more than 1 byte

Is it impossible to not accidentally get more than one and screw up SSL
handshake ?

-------------
Hannu



pgsql-hackers by date:

Previous
From: "Magnus Hagander"
Date:
Subject: Re: SQL-Invoked Procedures for 8.1
Next
From: Oliver Jowett
Date:
Subject: Re: SQL-Invoked Procedures for 8.1