Re: BUG #5837: PQstatus() fails to report lost connection - Mailing list pgsql-bugs

From Robert Haas
Subject Re: BUG #5837: PQstatus() fails to report lost connection
Date
Msg-id AANLkTi=rSATsN+3UD5feWm3ntsB=6jqAFBZWdTZ7ZfaO@mail.gmail.com
Whole thread Raw
In response to BUG #5837: PQstatus() fails to report lost connection  ("Murray S. Kucherawy" <msk@cloudmark.com>)
Responses Re: BUG #5837: PQstatus() fails to report lost connection
List pgsql-bugs
On Sun, Jan 23, 2011 at 10:59 PM, Murray S. Kucherawy <msk@cloudmark.com> w=
rote:
> As for the reply above, I disagree. =A0PQstatus(), as documented, doesn't=
 say anything about certain conditions in which it won't report that the co=
nnection is dead, when it actually is, once the connection was already esta=
blished and working.

I understand that gut feeling, but I think if you think about it a
little, you may realize that at some level it's an unreasonable
expectation.  For example, suppose you were to connect to the server
(successfully), unplug the network cable, and then call PQstatus().
There is literally no way for PQstatus() to know whether that
connection is still there.  It's just returning some internal flag out
of the object.  And even if we wanted it to do something else, what?
Send a keepalive packet of some sort and wait for a response?  Given
default TCP parameter settings, that could take many minutes to give
an answer - which is probably not what people want when they call
PQstatus() - and it would introduce a lot of otherwise unnecessary
network traffic for people who want to query the internal state of the
PGconn, not ping the server.

Now, in this case, things are a bit less clear, because it's not as if
the remote side has given us no indication that the connection is
about to get closed.  This doesn't strike me as awesome protocol
design, but 14 years on we're probably stuck with it. I think if you
want to have some special handling when an administrative shutdown
happens on the server, you should use PQresultErrorField() to check
PG_DIAG_SQLSTATE; or if you want FATAL and PANIC conditions to be
handled specially you can check PG_DIAG_SEVERITY.

> Moreover, the description of PQgetResult() doesn't say or illustrate anyt=
hing about proper use of it in this context, so how would a reader know he/=
she got it wrong? =A0The documentation I can find online of PQgetResult() d=
oesn't enumerate the conditions where PQstatus() gives a false indication o=
f whether or not a reconnect is required, nor does any part of the document=
ation I could find state that PGRES_FATAL_ERROR always implies the connecti=
on is no longer usable and must be re-established; "FATAL" could be referri=
ng to the transaction/request, not the connection.
>
> So, if this isn't a bug, then I think the documentation needs a bit of wo=
rk in this area.

The description of PGRES_FATAL_ERROR in the documentation does pretty
much suck.  I believe you'll get that if the backend returns either an
ERROR (in which case you need to retry the transaction) or a FATAL (in
which case you need to reset the connection), but that's not at all
clear either from the documentation or the naming of the constant
(which, alas, is hard to change at this point for
backward-compatibility reasons).

The description of PQstatus() looks correct to me.  It says that "a
communications failure might result in the status changing to
CONNECTION_BAD prematurely".  In the scenario you describe, no
communications failure has occurred.  The server sent back an error
message, and closed the connection (though libpq hasn't noticed yet)
but there's no communications error anywhere until the client tries to
send a query over a connection that doesn't exist any more.  Maybe we
could flesh that description out a bit to make it more clear that this
is really only going to tell you if TCP/IP has explicitly blown up in
your face, and not any other reason why things might not be working,
although I think there are hints of that there already.

--=20
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

pgsql-bugs by date:

Previous
From: Robert Haas
Date:
Subject: Re: BUG #5837: PQstatus() fails to report lost connection
Next
From: Xiaobo Gu
Date:
Subject: Re: [HACKERS] Is there a way to build PostgreSQL client libraries with MinGW