Thread: pgsql: libpq: Notice errors a backend may have sent just before dying.

pgsql: libpq: Notice errors a backend may have sent just before dying.

From
Robert Haas
Date:
libpq: Notice errors a backend may have sent just before dying.

At least since the introduction of Hot Standby, the backend has
sometimes sent fatal errors even when no client query was in
progress, assuming that the client would receive it.  However,
pqHandleSendFailure was not in sync with this assumption, and
only tries to catch notices and notifies.  Add a parseInput call
to the loop there to fix.

Andres Freund suggested the fix.  Comments are by me.
Reviewed by Michael Paquier.

Branch
------
master

Details
-------
http://git.postgresql.org/pg/commitdiff/c3e7c24a1d60dc6ad56e2a0723399f1570c54224

Modified Files
--------------
src/interfaces/libpq/fe-exec.c |   16 ++++++++--------
1 file changed, 8 insertions(+), 8 deletions(-)


Re: pgsql: libpq: Notice errors a backend may have sent just before dying.

From
Tom Lane
Date:
Robert Haas <rhaas@postgresql.org> writes:
> libpq: Notice errors a backend may have sent just before dying.
> At least since the introduction of Hot Standby, the backend has
> sometimes sent fatal errors even when no client query was in
> progress, assuming that the client would receive it.  However,
> pqHandleSendFailure was not in sync with this assumption, and
> only tries to catch notices and notifies.  Add a parseInput call
> to the loop there to fix.

Just for the record, this patch changes no behavior whatsoever.
It could only make a difference if the backend were still actively
sending data, which certainly isn't so in the case claimed by the
commit message.  Moreover, removing the comment that explains why
only NOTICE and NOTIFY messages will be eaten doesn't make it not so.

            regards, tom lane


Re: pgsql: libpq: Notice errors a backend may have sent just before dying.

From
Robert Haas
Date:
On Thu, Nov 12, 2015 at 9:29 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <rhaas@postgresql.org> writes:
>> libpq: Notice errors a backend may have sent just before dying.
>> At least since the introduction of Hot Standby, the backend has
>> sometimes sent fatal errors even when no client query was in
>> progress, assuming that the client would receive it.  However,
>> pqHandleSendFailure was not in sync with this assumption, and
>> only tries to catch notices and notifies.  Add a parseInput call
>> to the loop there to fix.
>
> Just for the record, this patch changes no behavior whatsoever.

Actually, yes it does.  I tested.  Without the patch, the message sent
by ac1d7945f866b1928c2554c0f80fd52d7f977772 isn't received by psql.
With the patch, it is.

> It could only make a difference if the backend were still actively
> sending data, which certainly isn't so in the case claimed by the
> commit message.  Moreover, removing the comment that explains why
> only NOTICE and NOTIFY messages will be eaten doesn't make it not so.

I did post an analysis of why I believe these changes are correct on
pgsql-hackers.  Did you read it?  If it's wrong, perhaps you could
respond to that thread. In short, the old comment is appears to me to
be factually wrong: the claim that PGASYNC_IDLE ignores everything but
NOTICE and NOTIFY is at odds with the actual behavior of
pqParseInput3.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: pgsql: libpq: Notice errors a backend may have sent just before dying.

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, Nov 12, 2015 at 9:29 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Just for the record, this patch changes no behavior whatsoever.

> Actually, yes it does.  I tested.  Without the patch, the message sent
> by ac1d7945f866b1928c2554c0f80fd52d7f977772 isn't received by psql.
> With the patch, it is.

Hmm ....

>> It could only make a difference if the backend were still actively
>> sending data, which certainly isn't so in the case claimed by the
>> commit message.  Moreover, removing the comment that explains why
>> only NOTICE and NOTIFY messages will be eaten doesn't make it not so.

> I did post an analysis of why I believe these changes are correct on
> pgsql-hackers.  Did you read it?  If it's wrong, perhaps you could
> respond to that thread. In short, the old comment is appears to me to
> be factually wrong: the claim that PGASYNC_IDLE ignores everything but
> NOTICE and NOTIFY is at odds with the actual behavior of
> pqParseInput3.

[ pokes around... ]  Yeah, that comment was out of date: it was obsoleted
by 58f481c4, but I missed updating this comment.

The point I was trying to make, though, was that whether parseInput will
process an 'E' message depends on the asyncStatus, and neither pqReadData
nor parseInput should change the asyncStatus if it's initially IDLE.

So on first glance, pumping pqReadData till it's dry and then doing
parseInput once should have the same effect as the new coding.  It's
not wrong to include a parseInput in the loop, and if there are multiple
messages available it might avoid buffer bloat, but this should not fix
any problem.

After looking around, I suspect what actually happened in your test
was that we kept pumping pqReadData until it realized it was seeing EOF,
whereupon it did pqDropConnection(), and guess what that does:

    /* Discard any unread/unsent data */
    conn->inStart = conn->inCursor = conn->inEnd = 0;
    conn->outCount = 0;

So the reason your patch appears to fix it is that you invoked parseInput
before the input buffer got wiped.  But I think a better fix would be to
remove the input-buffer flushing here.  Flushing the output buffer is
fine, since the data can never get sent, but it's a mistake to throw away
unprocessed data like that.  And there's surely no API guarantee by
pqReadData that the timing of its noticing EOF always works like this.

            regards, tom lane


Re: pgsql: libpq: Notice errors a backend may have sent just before dying.

From
Tom Lane
Date:
I wrote:
> After looking around, I suspect what actually happened in your test
> was that we kept pumping pqReadData until it realized it was seeing EOF,
> whereupon it did pqDropConnection(), and guess what that does:

>     /* Discard any unread/unsent data */
>     conn->inStart = conn->inCursor = conn->inEnd = 0;
>     conn->outCount = 0;

So after further review, this is a bug I introduced in 210eb9b74:
the fact that some code paths flushed the buffers and some did not
was less of an oversight than it appeared.  That explains why the
problem wasn't noticed years ago, because we'd certainly tested
pqHandleSendFailure and friends before.

I'm inclined to deal with this by adding a "dropInput" boolean flag
to pqDropConnection(), rather than reverting the centralization of
that logic altogether.

I'll go clean this up ...

            regards, tom lane


Re: pgsql: libpq: Notice errors a backend may have sent just before dying.

From
Robert Haas
Date:
On Thu, Nov 12, 2015 at 11:54 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I wrote:
>> After looking around, I suspect what actually happened in your test
>> was that we kept pumping pqReadData until it realized it was seeing EOF,
>> whereupon it did pqDropConnection(), and guess what that does:
>
>>       /* Discard any unread/unsent data */
>>       conn->inStart = conn->inCursor = conn->inEnd = 0;
>>       conn->outCount = 0;
>
> So after further review, this is a bug I introduced in 210eb9b74:
> the fact that some code paths flushed the buffers and some did not
> was less of an oversight than it appeared.  That explains why the
> problem wasn't noticed years ago, because we'd certainly tested
> pqHandleSendFailure and friends before.
>
> I'm inclined to deal with this by adding a "dropInput" boolean flag
> to pqDropConnection(), rather than reverting the centralization of
> that logic altogether.
>
> I'll go clean this up ...

Thanks.  I guess I should have studied this more deeply.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: pgsql: libpq: Notice errors a backend may have sent just before dying.

From
Michael Paquier
Date:
On Fri, Nov 13, 2015 at 1:54 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I wrote:
>> After looking around, I suspect what actually happened in your test
>> was that we kept pumping pqReadData until it realized it was seeing EOF,
>> whereupon it did pqDropConnection(), and guess what that does:
>
>>       /* Discard any unread/unsent data */
>>       conn->inStart = conn->inCursor = conn->inEnd = 0;
>>       conn->outCount = 0;
>
> So after further review, this is a bug I introduced in 210eb9b74:
> the fact that some code paths flushed the buffers and some did not
> was less of an oversight than it appeared.  That explains why the
> problem wasn't noticed years ago, because we'd certainly tested
> pqHandleSendFailure and friends before.
>
> I'm inclined to deal with this by adding a "dropInput" boolean flag
> to pqDropConnection(), rather than reverting the centralization of
> that logic altogether.
>
> I'll go clean this up ...

Interesting lesson. Thanks!
--
Michael