Thread: psql casts aspersions on server reliability

psql casts aspersions on server reliability

From
Robert Haas
Date:
psql tends to do things like this:

rhaas=# select * from pg_stat_activity;
FATAL:  terminating connection due to administrator command
server closed the connection unexpectedly   This probably means the server terminated abnormally   before or while
processingthe request.
 

Basically everything psql has to say about this is a lie:

1. It says the server closed the connection unexpectedly, but it just
finished processing a FATAL message from the server.  So how
unexpected is it that the connection would thereafter be closed?

2.  It says the server probably terminated abnormally, but PostgreSQL
is rarely so unreliable as to just give up the ghost and die.  The
ErrorResponse it just processed has an errcode of
ERRCODE_ADMIN_SHUTDOWN, so probably what happened is somebody either
shut down the server or killed the session.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: psql casts aspersions on server reliability

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> psql tends to do things like this:
> rhaas=# select * from pg_stat_activity;
> FATAL:  terminating connection due to administrator command
> server closed the connection unexpectedly
>     This probably means the server terminated abnormally
>     before or while processing the request.

> Basically everything psql has to say about this is a lie:

I cannot get terribly excited about this.  What you seem to be proposing
is that psql try to intuit the reason for connection closure from the
last error message it got, but that seems likely to lead to worse lies
than printing a boilerplate message.

I could go along with just dropping the last sentence ("This probably...")
if the last error we got was FATAL level.  I don't find "unexpectedly"
to be problematic here: from the point of view of psql, and probably
of its user, the shutdown *was* unexpected.
        regards, tom lane



Re: psql casts aspersions on server reliability

From
Robert Haas
Date:
On Wed, Sep 28, 2016 at 9:14 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> psql tends to do things like this:
>> rhaas=# select * from pg_stat_activity;
>> FATAL:  terminating connection due to administrator command
>> server closed the connection unexpectedly
>>     This probably means the server terminated abnormally
>>     before or while processing the request.
>
>> Basically everything psql has to say about this is a lie:
>
> I cannot get terribly excited about this.  What you seem to be proposing
> is that psql try to intuit the reason for connection closure from the
> last error message it got, but that seems likely to lead to worse lies
> than printing a boilerplate message.
>
> I could go along with just dropping the last sentence ("This probably...")
> if the last error we got was FATAL level.  I don't find "unexpectedly"
> to be problematic here: from the point of view of psql, and probably
> of its user, the shutdown *was* unexpected.

I don't care very much whether we try to intuit the reason for
connection closure or not; it could be done, but I don't feel that it
has to be done.  My bigger point is that currently psql speculates
that the reason for *every* connection closure is abnormal server
termination, which is actually a very rare event.

It may have been common when that message was added.
1a17447be1186fdd36391c58a2a0209f613d89c4 changed the wording this
message in 2001, and the original message seems to date to
011ee13131f6fa2f6dbafd3827b70d051cb28f64 in 1996.  And my guess is at
that time the server probably did just roll over and die with some
regularity.  But today it usually doesn't.  It's neither helpful nor
good PR for libpq to guess that the most likely cause of a server
disconnection is server unreliability.

I have seen actual instances of customers getting upset by this
message even though the server had been shut down quite cleanly.  The
message got into a logfile and induced minor panic.  Fortunately, I
have not seen this happen lately.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: psql casts aspersions on server reliability

From
David Steele
Date:
On 9/28/16 10:22 AM, Robert Haas wrote:
> On Wed, Sep 28, 2016 at 9:14 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Robert Haas <robertmhaas@gmail.com> writes:
>>> psql tends to do things like this:
>>> rhaas=# select * from pg_stat_activity;
>>> FATAL:  terminating connection due to administrator command
>>> server closed the connection unexpectedly
>>>     This probably means the server terminated abnormally
>>>     before or while processing the request.
>>
>>> Basically everything psql has to say about this is a lie:
>>
>> I cannot get terribly excited about this.  What you seem to be proposing
>> is that psql try to intuit the reason for connection closure from the
>> last error message it got, but that seems likely to lead to worse lies
>> than printing a boilerplate message.
>>
>> I could go along with just dropping the last sentence ("This probably...")
>> if the last error we got was FATAL level.  I don't find "unexpectedly"
>> to be problematic here: from the point of view of psql, and probably
>> of its user, the shutdown *was* unexpected.
> 
> I don't care very much whether we try to intuit the reason for
> connection closure or not; it could be done, but I don't feel that it
> has to be done.  My bigger point is that currently psql speculates
> that the reason for *every* connection closure is abnormal server
> termination, which is actually a very rare event.
> 
> It may have been common when that message was added.
> 1a17447be1186fdd36391c58a2a0209f613d89c4 changed the wording this
> message in 2001, and the original message seems to date to
> 011ee13131f6fa2f6dbafd3827b70d051cb28f64 in 1996.  And my guess is at
> that time the server probably did just roll over and die with some
> regularity.  But today it usually doesn't.  It's neither helpful nor
> good PR for libpq to guess that the most likely cause of a server
> disconnection is server unreliability.
> 
> I have seen actual instances of customers getting upset by this
> message even though the server had been shut down quite cleanly.  The
> message got into a logfile and induced minor panic.  Fortunately, I
> have not seen this happen lately.

+1 for making this error message less frightening.  I have also had to
explain it away on occasion.

-- 
-David
david@pgmasters.net



Re: psql casts aspersions on server reliability

From
Petr Jelinek
Date:
On 28/09/16 17:13, David Steele wrote:
> On 9/28/16 10:22 AM, Robert Haas wrote:
>> On Wed, Sep 28, 2016 at 9:14 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Robert Haas <robertmhaas@gmail.com> writes:
>>>> psql tends to do things like this:
>>>> rhaas=# select * from pg_stat_activity;
>>>> FATAL:  terminating connection due to administrator command
>>>> server closed the connection unexpectedly
>>>>     This probably means the server terminated abnormally
>>>>     before or while processing the request.
>>>
>>>> Basically everything psql has to say about this is a lie:
>>>
>>> I cannot get terribly excited about this.  What you seem to be proposing
>>> is that psql try to intuit the reason for connection closure from the
>>> last error message it got, but that seems likely to lead to worse lies
>>> than printing a boilerplate message.
>>>
>>> I could go along with just dropping the last sentence ("This probably...")
>>> if the last error we got was FATAL level.  I don't find "unexpectedly"
>>> to be problematic here: from the point of view of psql, and probably
>>> of its user, the shutdown *was* unexpected.
>>
>> I don't care very much whether we try to intuit the reason for
>> connection closure or not; it could be done, but I don't feel that it
>> has to be done.  My bigger point is that currently psql speculates
>> that the reason for *every* connection closure is abnormal server
>> termination, which is actually a very rare event.
>>
>> It may have been common when that message was added.
>> 1a17447be1186fdd36391c58a2a0209f613d89c4 changed the wording this
>> message in 2001, and the original message seems to date to
>> 011ee13131f6fa2f6dbafd3827b70d051cb28f64 in 1996.  And my guess is at
>> that time the server probably did just roll over and die with some
>> regularity.  But today it usually doesn't.  It's neither helpful nor
>> good PR for libpq to guess that the most likely cause of a server
>> disconnection is server unreliability.
>>
>> I have seen actual instances of customers getting upset by this
>> message even though the server had been shut down quite cleanly.  The
>> message got into a logfile and induced minor panic.  Fortunately, I
>> have not seen this happen lately.
> 
> +1 for making this error message less frightening.  I have also had to
> explain it away on occasion.
> 

+1 I've seen this being misleading way too often.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: [HACKERS] psql casts aspersions on server reliability

From
Bruce Momjian
Date:
On Wed, Sep 28, 2016 at 09:14:41AM -0400, Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > psql tends to do things like this:
> > rhaas=# select * from pg_stat_activity;
> > FATAL:  terminating connection due to administrator command
> > server closed the connection unexpectedly
> >     This probably means the server terminated abnormally
> >     before or while processing the request.
> 
> > Basically everything psql has to say about this is a lie:
> 
> I cannot get terribly excited about this.  What you seem to be proposing
> is that psql try to intuit the reason for connection closure from the
> last error message it got, but that seems likely to lead to worse lies
> than printing a boilerplate message.
> 
> I could go along with just dropping the last sentence ("This probably...")
> if the last error we got was FATAL level.  I don't find "unexpectedly"
> to be problematic here: from the point of view of psql, and probably
> of its user, the shutdown *was* unexpected.

I looked at this thread from 2016 and I think the problem is the
"abnormally" word, since if the server was shutdown by the administrator
(most likely), it isn't abnormal.  Here is a patch to remove
"abnormally".

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Only you can decide what is important to you.

Attachment

Re: [HACKERS] psql casts aspersions on server reliability

From
Tom Lane
Date:
Bruce Momjian <bruce@momjian.us> writes:
> On Wed, Sep 28, 2016 at 09:14:41AM -0400, Tom Lane wrote:
>> I could go along with just dropping the last sentence ("This probably...")
>> if the last error we got was FATAL level.  I don't find "unexpectedly"
>> to be problematic here: from the point of view of psql, and probably
>> of its user, the shutdown *was* unexpected.

> I looked at this thread from 2016 and I think the problem is the
> "abnormally" word, since if the server was shutdown by the administrator
> (most likely), it isn't abnormal.  Here is a patch to remove
> "abnormally".

I do not think this is an improvement.  The places you are changing
are reacting to a connection closure.  *If* we had previously gotten a
"FATAL: terminating connection due to administrator command" message,
then yeah the connection closure is expected; but if not, it isn't.
Your patch adds no check for that.  (As I remarked in 2016, we could
probably condition this on the elevel being FATAL, rather than
checking for specific error messages.)

            regards, tom lane



Re: [HACKERS] psql casts aspersions on server reliability

From
Bruce Momjian
Date:
On Wed, Nov 22, 2023 at 07:38:52PM -0500, Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > On Wed, Sep 28, 2016 at 09:14:41AM -0400, Tom Lane wrote:
> >> I could go along with just dropping the last sentence ("This probably...")
> >> if the last error we got was FATAL level.  I don't find "unexpectedly"
> >> to be problematic here: from the point of view of psql, and probably
> >> of its user, the shutdown *was* unexpected.
> 
> > I looked at this thread from 2016 and I think the problem is the
> > "abnormally" word, since if the server was shutdown by the administrator
> > (most likely), it isn't abnormal.  Here is a patch to remove
> > "abnormally".
> 
> I do not think this is an improvement.  The places you are changing
> are reacting to a connection closure.  *If* we had previously gotten a
> "FATAL: terminating connection due to administrator command" message,
> then yeah the connection closure is expected; but if not, it isn't.
> Your patch adds no check for that.  (As I remarked in 2016, we could
> probably condition this on the elevel being FATAL, rather than
> checking for specific error messages.)

Yes, you are correct.  Here is a patch that implements the FATAL test,
though I am not sure I have the logic correct or backwards, and I don't
know how to test this.  Thanks.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Only you can decide what is important to you.

Attachment

Re: [HACKERS] psql casts aspersions on server reliability

From
Bruce Momjian
Date:
On Wed, Nov 22, 2023 at 10:25:14PM -0500, Bruce Momjian wrote:
> On Wed, Nov 22, 2023 at 07:38:52PM -0500, Tom Lane wrote:
> > Bruce Momjian <bruce@momjian.us> writes:
> > > On Wed, Sep 28, 2016 at 09:14:41AM -0400, Tom Lane wrote:
> > >> I could go along with just dropping the last sentence ("This probably...")
> > >> if the last error we got was FATAL level.  I don't find "unexpectedly"
> > >> to be problematic here: from the point of view of psql, and probably
> > >> of its user, the shutdown *was* unexpected.
> > 
> > > I looked at this thread from 2016 and I think the problem is the
> > > "abnormally" word, since if the server was shutdown by the administrator
> > > (most likely), it isn't abnormal.  Here is a patch to remove
> > > "abnormally".
> > 
> > I do not think this is an improvement.  The places you are changing
> > are reacting to a connection closure.  *If* we had previously gotten a
> > "FATAL: terminating connection due to administrator command" message,
> > then yeah the connection closure is expected; but if not, it isn't.
> > Your patch adds no check for that.  (As I remarked in 2016, we could
> > probably condition this on the elevel being FATAL, rather than
> > checking for specific error messages.)
> 
> Yes, you are correct.  Here is a patch that implements the FATAL test,
> though I am not sure I have the logic correct or backwards, and I don't
> know how to test this.  Thanks.

I developed the attached patch which seems to work better.  In testing
kill -3 on a backend or calling elog(FATAL) in the server for a
session, libpq's 'res' is NULL, meaning we don't have any status to
check for PGRES_FATAL_ERROR.  It is very possible that libpq just isn't
stuctured to have the PGRES_FATAL_ERROR at the point where we issue this
message, and this is not worth improving.

    test=> select pg_sleep(100);
-->    FATAL:  FATAL called
    
    server closed the connection unexpectedly
-->            This probably means the server terminated null
            before or while processing the request.
    The connection to the server was lost. Attempting reset: Succeeded.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Only you can decide what is important to you.

Attachment

Re: [HACKERS] psql casts aspersions on server reliability

From
Laurenz Albe
Date:
On Thu, 2023-11-23 at 11:12 -0500, Bruce Momjian wrote:
> On Wed, Nov 22, 2023 at 10:25:14PM -0500, Bruce Momjian wrote:
> > Yes, you are correct.  Here is a patch that implements the FATAL test,
> > though I am not sure I have the logic correct or backwards, and I don't
> > know how to test this.  Thanks.
>
> I developed the attached patch which seems to work better.  In testing
> kill -3 on a backend or calling elog(FATAL) in the server for a
> session, libpq's 'res' is NULL, meaning we don't have any status to
> check for PGRES_FATAL_ERROR.  It is very possible that libpq just isn't
> stuctured to have the PGRES_FATAL_ERROR at the point where we issue this
> message, and this is not worth improving.
>
>     test=> select pg_sleep(100);
> -->    FATAL:  FATAL called
>
>     server closed the connection unexpectedly
> -->            This probably means the server terminated null
>             before or while processing the request.
>     The connection to the server was lost. Attempting reset: Succeeded.

I don't thing "terminated null" is a meaningful message.

>      libpq_append_conn_error(conn, "server closed the connection unexpectedly\n"
> -                            "\tThis probably means the server terminated abnormally\n"
> -                            "\tbefore or while processing the request.");
> +                            "\tThis probably means the server terminated%s\n"
> +                            "\tbefore or while processing the request.",
> +                            (conn->result == NULL) ? " null" :
> +                            (conn->result->resultStatus == PGRES_FATAL_ERROR) ?
> +                            "" : " abnormally");

Apart from the weird "null", will that work well for translation?

> --- a/src/interfaces/libpq/fe-protocol3.c
> +++ b/src/interfaces/libpq/fe-protocol3.c
> @@ -2158,6 +2158,7 @@ pqFunctionCall3(PGconn *conn, Oid fnid,
>                  if (pqGetErrorNotice3(conn, true))
>                      continue;
>                  status = PGRES_FATAL_ERROR;
> +                fprintf(stderr, "Got 'E'\n");
>                  break;
>              case 'A':            /* notify message */
>                  /* handle notify and go back to processing return values */

That looks like a leftover debugging message.

Yours,
Laurenz Albe



Re: [HACKERS] psql casts aspersions on server reliability

From
Bruce Momjian
Date:
On Fri, Nov 24, 2023 at 04:06:22AM +0100, Laurenz Albe wrote:
> On Thu, 2023-11-23 at 11:12 -0500, Bruce Momjian wrote:
> > On Wed, Nov 22, 2023 at 10:25:14PM -0500, Bruce Momjian wrote:
> > > Yes, you are correct.  Here is a patch that implements the FATAL test,
> > > though I am not sure I have the logic correct or backwards, and I don't
> > > know how to test this.  Thanks.
> > 
> > I developed the attached patch which seems to work better.  In testing
> > kill -3 on a backend or calling elog(FATAL) in the server for a
> > session, libpq's 'res' is NULL, meaning we don't have any status to
> > check for PGRES_FATAL_ERROR.  It is very possible that libpq just isn't
> > stuctured to have the PGRES_FATAL_ERROR at the point where we issue this
> > message, and this is not worth improving.
> > 
> >     test=> select pg_sleep(100);
> > -->    FATAL:  FATAL called
> >     
> >     server closed the connection unexpectedly
> > -->            This probably means the server terminated null
> >             before or while processing the request.
> >     The connection to the server was lost. Attempting reset: Succeeded.
> 
> I don't thing "terminated null" is a meaningful message.

Yes, this is just a debug build so we can see the values of 'res'. 
Sorry for the confusion.  This attached patch has the elog() added so
you can reproduce what I saw.

I am actually unclear which exits should be labled as "abnormal".

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Only you can decide what is important to you.

Attachment

Re: [HACKERS] psql casts aspersions on server reliability

From
Bruce Momjian
Date:
On Fri, Nov 24, 2023 at 10:19:29AM -0500, Bruce Momjian wrote:
> On Fri, Nov 24, 2023 at 04:06:22AM +0100, Laurenz Albe wrote:
> > On Thu, 2023-11-23 at 11:12 -0500, Bruce Momjian wrote:
> > > On Wed, Nov 22, 2023 at 10:25:14PM -0500, Bruce Momjian wrote:
> > > > Yes, you are correct.  Here is a patch that implements the FATAL test,
> > > > though I am not sure I have the logic correct or backwards, and I don't
> > > > know how to test this.  Thanks.
> > > 
> > > I developed the attached patch which seems to work better.  In testing
> > > kill -3 on a backend or calling elog(FATAL) in the server for a
> > > session, libpq's 'res' is NULL, meaning we don't have any status to
> > > check for PGRES_FATAL_ERROR.  It is very possible that libpq just isn't
> > > structured to have the PGRES_FATAL_ERROR at the point where we issue this
> > > message, and this is not worth improving.
> > > 
> > >     test=> select pg_sleep(100);
> > > -->    FATAL:  FATAL called
> > >     
> > >     server closed the connection unexpectedly
> > > -->            This probably means the server terminated null
> > >             before or while processing the request.
> > >     The connection to the server was lost. Attempting reset: Succeeded.
> > 
> > I don't thing "terminated null" is a meaningful message.
> 
> Yes, this is just a debug build so we can see the values of 'res'. 
> Sorry for the confusion.  This attached patch has the elog() added so
> you can reproduce what I saw.
> 
> I am actually unclear which exits should be labeled as "abnormal".

There are five call sites which issue this message, so I looked at
adding "abnormally" just at the call sites where it made sense, but I
couldn't find a pattern.  I don't plan to pursue this further.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Only you can decide what is important to you.