Thread: SIGTERM/FATAL error

SIGTERM/FATAL error

From
Bruce Momjian
Date:
I have applied the following patch to make SIGTERM backend exit clearer
in the the server logs.  "The system" is not really shutting down, but
"the backend" is shutting down.

Should we be showing the PID's in the server logs more.  Do we need to
enable that somewhere?  Seems they are very hard to follow without
PID's.

---------------------------------------------------------------------------


Index: src/backend/tcop/postgres.c
===================================================================
RCS file: /home/projects/pgsql/cvsroot/pgsql/src/backend/tcop/postgres.c,v
retrieving revision 1.208
diff -c -r1.208 postgres.c
*** src/backend/tcop/postgres.c    2001/02/24 02:04:51    1.208
--- src/backend/tcop/postgres.c    2001/03/11 19:04:47
***************
*** 1022,1028 ****         ProcDiePending = false;         QueryCancelPending = false;    /* ProcDie trumps QueryCancel
*/        ImmediateInterruptOK = false; /* not idle anymore */
 
!         elog(FATAL, "The system is shutting down");     }     if (QueryCancelPending)     {
--- 1022,1028 ----         ProcDiePending = false;         QueryCancelPending = false;    /* ProcDie trumps QueryCancel
*/        ImmediateInterruptOK = false; /* not idle anymore */
 
!         elog(FATAL, "Backend shutting down");     }     if (QueryCancelPending)     {

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: SIGTERM/FATAL error

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> I have applied the following patch to make SIGTERM backend exit clearer
> in the the server logs.  "The system" is not really shutting down, but
> "the backend" is shutting down.

This is a non-improvement.  Please reverse it.  SIGTERM would only be
sent to a backend if the database system were in fact shutting down.

> Should we be showing the PID's in the server logs more.  Do we need to
> enable that somewhere?

There's already an option for that, but it should not be forced on since
it will be redundant for those using syslog.
        regards, tom lane


Re: SIGTERM/FATAL error

From
Bruce Momjian
Date:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > I have applied the following patch to make SIGTERM backend exit clearer
> > in the the server logs.  "The system" is not really shutting down, but
> > "the backend" is shutting down.
> 
> This is a non-improvement.  Please reverse it.  SIGTERM would only be
> sent to a backend if the database system were in fact shutting down.

Reversed.

But why say the system is shutting down if the backend is shutting down.
Seems the postmaster should say system shutting down and each backend
should say it is shutting itself down.  The way it is now, don't we get
a "system shutting down" message for every running backend?


> 
> > Should we be showing the PID's in the server logs more.  Do we need to
> > enable that somewhere?
> 
> There's already an option for that, but it should not be forced on since
> it will be redundant for those using syslog.

Does syslog show the pid?

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: SIGTERM/FATAL error

From
Bruce Momjian
Date:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > I have applied the following patch to make SIGTERM backend exit clearer
> > in the the server logs.  "The system" is not really shutting down, but
> > "the backend" is shutting down.
> 
> This is a non-improvement.  Please reverse it.  SIGTERM would only be
> sent to a backend if the database system were in fact shutting down.

Also, what signal should people send to a backend to kill just that
backend?  In my reading of the code, I see:
   pqsignal(SIGTERM, die);     /* cancel current query and exit */   pqsignal(SIGQUIT, die);     /* could reassign this
sigfor another use */
 

Are either of them safe?

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: SIGTERM/FATAL error

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
>> This is a non-improvement.  Please reverse it.  SIGTERM would only be
>> sent to a backend if the database system were in fact shutting down.

> But why say the system is shutting down if the backend is shutting down.
> Seems the postmaster should say system shutting down and each backend
> should say it is shutting itself down.  The way it is now, don't we get
> a "system shutting down" message for every running backend?

You are failing to consider that the primary audience for this error
message is not the system log, but the clients of the backends.  They
are going to see only one message, and they are going to want to know
*why* their backend shut down.
        regards, tom lane


Re: SIGTERM/FATAL error

From
Bruce Momjian
Date:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> >> This is a non-improvement.  Please reverse it.  SIGTERM would only be
> >> sent to a backend if the database system were in fact shutting down.
> 
> > But why say the system is shutting down if the backend is shutting down.
> > Seems the postmaster should say system shutting down and each backend
> > should say it is shutting itself down.  The way it is now, don't we get
> > a "system shutting down" message for every running backend?
> 
> You are failing to consider that the primary audience for this error
> message is not the system log, but the clients of the backends.  They
> are going to see only one message, and they are going to want to know
> *why* their backend shut down.

Oops, I get it now.  Makes perfect sense.  Thanks.

I am using the SIGTERM in my administration application to allow
administrators to kill individual backends.  That is why I noticed the
message.


--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: SIGTERM/FATAL error

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Also, what signal should people send to a backend to kill just that
> backend?

I don't know that we do or should recommend such a thing at all ...
but SIGTERM should work if anything does (and it is, not coincidentally,
the default kind of signal for kill(1)).

> In my reading of the code, I see:

>     pqsignal(SIGTERM, die);     /* cancel current query and exit */
>     pqsignal(SIGQUIT, die);     /* could reassign this sig for another use */

This is already obsolete ;=) ... I'm just waiting on Vadim's approval of
my xlog mods before committing a change in SIGQUIT handling --- see
discussion a couple days ago.
        regards, tom lane


Re: SIGTERM/FATAL error

From
Bruce Momjian
Date:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Also, what signal should people send to a backend to kill just that
> > backend?
> 
> I don't know that we do or should recommend such a thing at all ...
> but SIGTERM should work if anything does (and it is, not coincidentally,
> the default kind of signal for kill(1)).
> 
> > In my reading of the code, I see:
> 
> >     pqsignal(SIGTERM, die);     /* cancel current query and exit */
> >     pqsignal(SIGQUIT, die);     /* could reassign this sig for another use */
> 
> This is already obsolete ;=) ... I'm just waiting on Vadim's approval of
> my xlog mods before committing a change in SIGQUIT handling --- see
> discussion a couple days ago.

Yes, I knew that was coming, so I sayed with SIGTERM because it should
work on 7.0 and 7.1.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: SIGTERM/FATAL error

From
Hiroshi Inoue
Date:
Tom Lane wrote:
> 
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> >> This is a non-improvement.  Please reverse it.  SIGTERM would only be
> >> sent to a backend if the database system were in fact shutting down.
> 
> > But why say the system is shutting down if the backend is shutting down.
> > Seems the postmaster should say system shutting down and each backend
> > should say it is shutting itself down.  The way it is now, don't we get
> > a "system shutting down" message for every running backend?
> 
> You are failing to consider that the primary audience for this error
> message is not the system log, but the clients of the backends.  They
> are going to see only one message, and they are going to want to know
> *why* their backend shut down.
> 

How could the backend know why it is shut down ?
Is it inhibited to kill a backend individually ?
What is a real syetem shut down message ? 
I agree with Bruce to change the backend shut down
message.

regards,
Hiroshi Inoue


Re: SIGTERM/FATAL error

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> I am using the SIGTERM in my administration application to allow
> administrators to kill individual backends.  That is why I noticed the
> message.

Hm.  Of course the backend cannot tell the difference between this use
of SIGTERM and its normal use for system shutdown.  Maybe we could
come up with a compromise message --- although I suspect a compromise
would just be more confusing.

A more significant issue is whether it's really a good idea to start
encouraging dbadmins to go around killing individual backends.  I think
this is likely to be a Bad Idea (tm).  We have no experience (that I know
of) with applying SIGTERM for any other purpose than system shutdown or
forced restart.  Are you really prepared to guarantee that shared memory
will always be left in a consistent state?  That there will be no locks
left locked, etc?
        regards, tom lane


Re: SIGTERM/FATAL error

From
Bruce Momjian
Date:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > I am using the SIGTERM in my administration application to allow
> > administrators to kill individual backends.  That is why I noticed the
> > message.
> 
> Hm.  Of course the backend cannot tell the difference between this use
> of SIGTERM and its normal use for system shutdown.  Maybe we could
> come up with a compromise message --- although I suspect a compromise
> would just be more confusing.

How about "Connection terminated by administrator", or something like
that.


> 
> A more significant issue is whether it's really a good idea to start
> encouraging dbadmins to go around killing individual backends.  I think
> this is likely to be a Bad Idea (tm).  We have no experience (that I know
> of) with applying SIGTERM for any other purpose than system shutdown or
> forced restart.  Are you really prepared to guarantee that shared memory
> will always be left in a consistent state?  That there will be no locks
> left locked, etc?

Not sure.  My admin tool is more proof of concept at this point.  I
think ultimately we will need to allow administrators to such individual
backend terminations.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: SIGTERM/FATAL error

From
Lincoln Yeoh
Date:
At 08:59 PM 11-03-2001 -0500, Bruce Momjian wrote:
>How about "Connection terminated by administrator", or something like
>that.

I prefer something closer to the truth.

e.g.
"Received SIGTERM, cancelling query and exiting"
(assuming it actually cancels the query).

But maybe I'm weird.

Cheerio,
Link.



Re: SIGTERM/FATAL error

From
Bruce Momjian
Date:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Not sure.  My admin tool is more proof of concept at this point.  I
> > think ultimately we will need to allow administrators to such individual
> > backend terminations.
> 
> I hope the tool is set up to encourage them to try something safer
> (ie, CANCEL QUERY) first...

Yes, the CANCEL button appears before the TERMINATE button.

On SIGTERM, I think we are fooling ourselves if we think people aren't
SIGTERM'ing individual backends.  Terminating individual db connections
is a very common job for an administrator.  If SIGTERM doesn't cause
proper shutdown for individual backends, I think it should.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: SIGTERM/FATAL error

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Not sure.  My admin tool is more proof of concept at this point.  I
> think ultimately we will need to allow administrators to such individual
> backend terminations.

I hope the tool is set up to encourage them to try something safer
(ie, CANCEL QUERY) first...
        regards, tom lane


Re: SIGTERM/FATAL error

From
Bruce Momjian
Date:
Tom is there new wording we can agree on?

> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Not sure.  My admin tool is more proof of concept at this point.  I
> > think ultimately we will need to allow administrators to such individual
> > backend terminations.
> 
> I hope the tool is set up to encourage them to try something safer
> (ie, CANCEL QUERY) first...
> 
>             regards, tom lane
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo@postgresql.org so that your
> message can get through to the mailing list cleanly
> 


--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026