Thread: Re: [HACKERS] Function to kill backend

Re: [HACKERS] Function to kill backend

From

Bruce Momjian

Date:

11 April 2004, 01:00:37

I have applied the attached patch:

   Exit backend from SIGTERM or FATAL by simulating client EOF, rather than
   calling proc_exit() directly.  This should make SIGTERM more reliable.


---------------------------------------------------------------------------

Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > On first glance, I don't see anything dangerous about SIGTERM.
>
> You haven't thought about it very hard :-(
>
> The major difference I see is that elog(FATAL) will call proc_exit
> directly from elog, rather than longjmp'ing back to PostgresMain.
> The case that we have confidence in involves elog(ERROR) returning to
> PostgresMain and then calling proc_exit from there (in the path where
> we get EOF from the client).
>
> This leaves me with a couple of concerns:
>
> * Notice all that cleanup/reset stuff in the "if (sigsetjmp())" block
> in PostgresMain.  SIGTERM will cause proc_exit to be entered without
> any of that being done first.  Does it work reliably?  Shouldn't this be
> refactored to ensure the same things happen in both cases?
>
> * There are various places, especially in the PLs, that try to hook into
> error recovery by manipulating Warn_restart.  Will any of them have
> problems if their error recovery code doesn't get called during SIGTERM
> exit?
>
> One possible refactoring is for elog(FATAL) to go ahead and longjmp back
> to PostgresMain, and at the end of the error recovery block check a flag
> and do proc_exit() if we're fataling.  However I am not sure that this
> doesn't break the design for coping with elog's during proc_exit.
>
> Alvaro's nested-transaction work is another thing that's got to be
> thought about before touching this code.  I have not yet seen any design
> for error recovery in the nested xact case, but I am sure it's going to
> need some changes right around here.
>
>             regards, tom lane
>

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
Index: src/backend/tcop/postgres.c
===================================================================
RCS file: /cvsroot/pgsql-server/src/backend/tcop/postgres.c,v
retrieving revision 1.398
diff -c -c -r1.398 postgres.c
*** src/backend/tcop/postgres.c    7 Apr 2004 05:05:49 -0000    1.398
--- src/backend/tcop/postgres.c    9 Apr 2004 17:42:46 -0000
***************
*** 2938,2944 ****
          /*
           * (3) read a command (loop blocks here)
           */
!         firstchar = ReadCommand(&input_message);

          /*
           * (4) disable async signal conditions again.
--- 2938,2947 ----
          /*
           * (3) read a command (loop blocks here)
           */
!          if (!in_fatal_exit)
!             firstchar = ReadCommand(&input_message);
!         else
!             firstchar = EOF;

          /*
           * (4) disable async signal conditions again.
***************
*** 3170,3176 ****
                   * Otherwise it will fail to be called during other
                   * backend-shutdown scenarios.
                   */
!                 proc_exit(0);

              case 'd':            /* copy data */
              case 'c':            /* copy done */
--- 3173,3180 ----
                   * Otherwise it will fail to be called during other
                   * backend-shutdown scenarios.
                   */
!                 proc_exit(!in_fatal_exit ? 0 : proc_exit_inprogress ||
!                                                 !IsUnderPostmaster);

              case 'd':            /* copy data */
              case 'c':            /* copy done */
Index: src/backend/utils/error/elog.c
===================================================================
RCS file: /cvsroot/pgsql-server/src/backend/utils/error/elog.c,v
retrieving revision 1.132
diff -c -c -r1.132 elog.c
*** src/backend/utils/error/elog.c    5 Apr 2004 03:02:06 -0000    1.132
--- src/backend/utils/error/elog.c    9 Apr 2004 17:42:47 -0000
***************
*** 72,77 ****
--- 72,79 ----
  char       *Log_line_prefix = NULL; /* format for extra log line info */
  unsigned int Log_destination;

+ bool in_fatal_exit = false;
+
  #ifdef HAVE_SYSLOG
  char       *Syslog_facility;    /* openlog() parameters */
  char       *Syslog_ident;
***************
*** 442,448 ****
               */
              fflush(stdout);
              fflush(stderr);
!             proc_exit(proc_exit_inprogress || !IsUnderPostmaster);
          }

          /*
--- 444,455 ----
               */
              fflush(stdout);
              fflush(stderr);
!
!             if (in_fatal_exit)
!                 ereport(PANIC, (errmsg("fatal error during fatal exit, giving up")));
!
!             /* We will exit the backend by simulating a client EOF */
!             in_fatal_exit = true;
          }

          /*
Index: src/include/tcop/tcopprot.h
===================================================================
RCS file: /cvsroot/pgsql-server/src/include/tcop/tcopprot.h,v
retrieving revision 1.64
diff -c -c -r1.64 tcopprot.h
*** src/include/tcop/tcopprot.h    7 Apr 2004 05:05:50 -0000    1.64
--- src/include/tcop/tcopprot.h    9 Apr 2004 17:42:47 -0000
***************
*** 34,39 ****
--- 34,40 ----
  extern DLLIMPORT const char *debug_query_string;
  extern char *rendezvous_name;
  extern int    max_stack_depth;
+ extern bool in_fatal_exit;

  /* GUC-configurable parameters */

Re: [HACKERS] Function to kill backend

From

Tom Lane

Date:

24 July 2004, 20:36:12

Bruce Momjian <pgman@candle.pha.pa.us> writes:
> I have applied the attached patch:
>    Exit backend from SIGTERM or FATAL by simulating client EOF, rather than
>    calling proc_exit() directly.  This should make SIGTERM more reliable.

After further consideration I have concluded that this was a
spectacularly bad idea and we should revert that patch.  There is a very
large amount of processing that this patch will cause to happen after a
FATAL error has been declared, and I doubt that any of it is a good
idea.  Some examples:

* AbortCurrentTransaction() --- not too cool if the FATAL error was one
  of the ones in xact.c that are complaining of fatally bollixed
  transaction state.

* pgstat reporting --- aside from the chance of an outright crash, we
  might be transmitting bogus statistics to the collector.

* sending a ReadyForQuery (Z) message --- one thing we quite certainly
  ain't is ReadyForQuery.

* EnableNotifyInterrupt --- this may result in actually trying to run
  a transaction to look through pg_listener :-(

* ProcessConfigFile, if we had a pending SIGHUP --- also not too cool,
  if the FATAL was from guc.c.

I am still dubious that zapping random backends with SIGTERM is a sane
or supportable idea.  But this patch does not make things better, it
simply greatly increases the chance of a FATAL exit turning into a
backend crash or PANIC.

            regards, tom lane

Re: [HACKERS] Function to kill backend

From

Bruce Momjian

Date:

24 July 2004, 21:44:30

OK, I see your point.  Can anyone remember why this was needed?  I
remember Magnus wanted query cancel, but what was the logic for session
termination?

---------------------------------------------------------------------------

Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > I have applied the attached patch:
> >    Exit backend from SIGTERM or FATAL by simulating client EOF, rather than
> >    calling proc_exit() directly.  This should make SIGTERM more reliable.
>
> After further consideration I have concluded that this was a
> spectacularly bad idea and we should revert that patch.  There is a very
> large amount of processing that this patch will cause to happen after a
> FATAL error has been declared, and I doubt that any of it is a good
> idea.  Some examples:
>
> * AbortCurrentTransaction() --- not too cool if the FATAL error was one
>   of the ones in xact.c that are complaining of fatally bollixed
>   transaction state.
>
> * pgstat reporting --- aside from the chance of an outright crash, we
>   might be transmitting bogus statistics to the collector.
>
> * sending a ReadyForQuery (Z) message --- one thing we quite certainly
>   ain't is ReadyForQuery.
>
> * EnableNotifyInterrupt --- this may result in actually trying to run
>   a transaction to look through pg_listener :-(
>
> * ProcessConfigFile, if we had a pending SIGHUP --- also not too cool,
>   if the FATAL was from guc.c.
>
>
> I am still dubious that zapping random backends with SIGTERM is a sane
> or supportable idea.  But this patch does not make things better, it
> simply greatly increases the chance of a FATAL exit turning into a
> backend crash or PANIC.
>
>             regards, tom lane
>

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: [HACKERS] Function to kill backend

From

Tom Lane

Date:

24 July 2004, 22:51:15

Bruce Momjian <pgman@candle.pha.pa.us> writes:
> OK, I see your point.  Can anyone remember why this was needed?  I
> remember Magnus wanted query cancel, but what was the logic for session
> termination?

I think the argument for it went along the lines of "what if you've got
a misbehaving client that continually issues expensive queries, so
canceling any one query doesn't help?"  I found this unconvincing, as
a client that obstreperous might simply reconnect and start issuing the
same queries again; so having a session-kill tool doesn't really get you
much further.  I recall being voted down though ...

            regards, tom lane

Re: [HACKERS] Function to kill backend

From

Bruce Momjian

Date:

24 July 2004, 22:56:13

Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > OK, I see your point.  Can anyone remember why this was needed?  I
> > remember Magnus wanted query cancel, but what was the logic for session
> > termination?
>
> I think the argument for it went along the lines of "what if you've got
> a misbehaving client that continually issues expensive queries, so
> canceling any one query doesn't help?"  I found this unconvincing, as
> a client that obstreperous might simply reconnect and start issuing the
> same queries again; so having a session-kill tool doesn't really get you
> much further.  I recall being voted down though ...

Well, seems you have enough reason to suspect this code, and being so
near to beta I think we can remove it and come back to it later if
people still want it.  I remember the "cancel" being the key feature,
and termination was more of a "nice have" item.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073