Thread: Re: [HACKERS] Terminating a backend

Re: [HACKERS] Terminating a backend

From
Bruce Momjian
Date:
Bruce Momjian wrote:
> I have an idea for this TODO item:
>
>     * Allow administrators to safely terminate individual sessions either
>       via an SQL function or SIGTERM
>
>       Lock table corruption following SIGTERM of an individual backend
>       has been reported in 8.0.  A possible cause was fixed in 8.1, but
>       it is unknown whether other problems exist.  This item mostly
>       requires additional testing rather than of writing any new code.
>
>       http://archives.postgresql.org/pgsql-hackers/2006-08/msg00174.php
>
> When we get the termination signal, why can't we just set a global
> boolean, do a query cancel, and in the setjmp() code block check the
> global and exit --- at that stage we know we have released all locks and
> can exit cleanly.

I have implemented this idea with the attached patch.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +
Index: doc/src/sgml/func.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/func.sgml,v
retrieving revision 1.429
diff -c -c -r1.429 func.sgml
*** doc/src/sgml/func.sgml    10 Apr 2008 13:34:33 -0000    1.429
--- doc/src/sgml/func.sgml    12 Apr 2008 13:48:33 -0000
***************
*** 11849,11854 ****
--- 11849,11857 ----
      <primary>pg_cancel_backend</primary>
     </indexterm>
     <indexterm>
+     <primary>pg_terminate_backend</primary>
+    </indexterm>
+    <indexterm>
      <primary>pg_reload_conf</primary>
     </indexterm>
     <indexterm>
***************
*** 11885,11890 ****
--- 11888,11900 ----
        </row>
        <row>
         <entry>
+         <literal><function>pg_terminate_backend</function>(<parameter>pid</parameter> <type>int</>)</literal>
+         </entry>
+        <entry><type>boolean</type></entry>
+        <entry>Terminate a backend</entry>
+       </row>
+       <row>
+        <entry>
          <literal><function>pg_reload_conf</function>()</literal>
          </entry>
         <entry><type>boolean</type></entry>
***************
*** 11909,11915 ****
     <para>
      <function>pg_cancel_backend</> sends a query cancel
      (<systemitem>SIGINT</>) signal to a backend process identified by
!     process ID.  The process ID of an active backend can be found from
      the <structfield>procpid</structfield> column in the
      <structname>pg_stat_activity</structname> view, or by listing the
      <command>postgres</command> processes on the server with
--- 11919,11927 ----
     <para>
      <function>pg_cancel_backend</> sends a query cancel
      (<systemitem>SIGINT</>) signal to a backend process identified by
!     process ID.  <function>pg_terminate_backend</> sends a terminate
!     (<systemitem>SIGTERM</>) signal to the specified process ID. The
!     process ID of an active backend can be found from
      the <structfield>procpid</structfield> column in the
      <structname>pg_stat_activity</structname> view, or by listing the
      <command>postgres</command> processes on the server with
Index: src/backend/port/ipc_test.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/port/ipc_test.c,v
retrieving revision 1.24
diff -c -c -r1.24 ipc_test.c
*** src/backend/port/ipc_test.c    24 Mar 2008 18:08:47 -0000    1.24
--- src/backend/port/ipc_test.c    12 Apr 2008 13:48:33 -0000
***************
*** 40,45 ****
--- 40,46 ----

  volatile bool InterruptPending = false;
  volatile bool QueryCancelPending = false;
+ volatile bool BackendTerminatePending = false;
  volatile bool ProcDiePending = false;
  volatile bool ImmediateInterruptOK = false;
  volatile uint32 InterruptHoldoffCount = 0;
Index: src/backend/postmaster/autovacuum.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/postmaster/autovacuum.c,v
retrieving revision 1.76
diff -c -c -r1.76 autovacuum.c
*** src/backend/postmaster/autovacuum.c    26 Mar 2008 21:10:38 -0000    1.76
--- src/backend/postmaster/autovacuum.c    12 Apr 2008 13:48:33 -0000
***************
*** 1475,1481 ****
       * means abort and exit cleanly, and SIGQUIT means abandon ship.
       */
      pqsignal(SIGINT, StatementCancelHandler);
!     pqsignal(SIGTERM, die);
      pqsignal(SIGQUIT, quickdie);
      pqsignal(SIGALRM, handle_sig_alarm);

--- 1475,1481 ----
       * means abort and exit cleanly, and SIGQUIT means abandon ship.
       */
      pqsignal(SIGINT, StatementCancelHandler);
!     pqsignal(SIGTERM, BackendTerminateHandler);
      pqsignal(SIGQUIT, quickdie);
      pqsignal(SIGALRM, handle_sig_alarm);

Index: src/backend/tcop/postgres.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/tcop/postgres.c,v
retrieving revision 1.548
diff -c -c -r1.548 postgres.c
*** src/backend/tcop/postgres.c    2 Apr 2008 18:31:50 -0000    1.548
--- src/backend/tcop/postgres.c    12 Apr 2008 13:48:33 -0000
***************
*** 2541,2547 ****
           * waiting for input, however.
           */
          if (ImmediateInterruptOK && InterruptHoldoffCount == 0 &&
!             CritSectionCount == 0 && !DoingCommandRead)
          {
              /* bump holdoff count to make ProcessInterrupts() a no-op */
              /* until we are done getting ready for it */
--- 2541,2548 ----
           * waiting for input, however.
           */
          if (ImmediateInterruptOK && InterruptHoldoffCount == 0 &&
!             CritSectionCount == 0 &&
!             (!DoingCommandRead || BackendTerminatePending))
          {
              /* bump holdoff count to make ProcessInterrupts() a no-op */
              /* until we are done getting ready for it */
***************
*** 2557,2562 ****
--- 2558,2575 ----
      errno = save_errno;
  }

+ /*
+  * Backend terminate happens by canceling the current query, then
+  * exiting after the longjump() returns to the main query loop.
+  * This is done so that all resources are properly reset before exit.
+  */
+ void
+ BackendTerminateHandler(SIGNAL_ARGS)
+ {
+     BackendTerminatePending = true;
+     StatementCancelHandler(postgres_signal_arg);
+ }
+
  /* signal handler for floating point exception */
  void
  FloatExceptionHandler(SIGNAL_ARGS)
***************
*** 2621,2626 ****
--- 2634,2643 ----
              ereport(ERROR,
                      (errcode(ERRCODE_QUERY_CANCELED),
                       errmsg("canceling autovacuum task")));
+         else if (BackendTerminatePending)
+                 ereport(ERROR,
+                     (errcode(ERRCODE_QUERY_CANCELED),
+                      errmsg("terminating backend due to user request")));
          else
              ereport(ERROR,
                      (errcode(ERRCODE_QUERY_CANCELED),
***************
*** 3154,3160 ****
       */
      pqsignal(SIGHUP, SigHupHandler);    /* set flag to read config file */
      pqsignal(SIGINT, StatementCancelHandler);    /* cancel current query */
!     pqsignal(SIGTERM, die);        /* cancel current query and exit */

      /*
       * In a standalone backend, SIGQUIT can be generated from the keyboard
--- 3171,3177 ----
       */
      pqsignal(SIGHUP, SigHupHandler);    /* set flag to read config file */
      pqsignal(SIGINT, StatementCancelHandler);    /* cancel current query */
!     pqsignal(SIGTERM, BackendTerminateHandler);

      /*
       * In a standalone backend, SIGQUIT can be generated from the keyboard
***************
*** 3459,3464 ****
--- 3476,3484 ----
          /* We don't have a transaction command open anymore */
          xact_started = false;

+         if (BackendTerminatePending)
+             die(SIGTERM);
+
          /* Now we can allow interrupts again */
          RESUME_INTERRUPTS();
      }
Index: src/backend/utils/adt/misc.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/utils/adt/misc.c,v
retrieving revision 1.59
diff -c -c -r1.59 misc.c
*** src/backend/utils/adt/misc.c    4 Apr 2008 16:57:21 -0000    1.59
--- src/backend/utils/adt/misc.c    12 Apr 2008 13:48:33 -0000
***************
*** 129,134 ****
--- 129,140 ----
  }

  Datum
+ pg_terminate_backend(PG_FUNCTION_ARGS)
+ {
+     PG_RETURN_BOOL(pg_signal_backend(PG_GETARG_INT32(0), SIGTERM));
+ }
+
+ Datum
  pg_reload_conf(PG_FUNCTION_ARGS)
  {
      if (!superuser())
Index: src/backend/utils/init/globals.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/utils/init/globals.c,v
retrieving revision 1.105
diff -c -c -r1.105 globals.c
*** src/backend/utils/init/globals.c    17 Feb 2008 02:09:29 -0000    1.105
--- src/backend/utils/init/globals.c    12 Apr 2008 13:48:33 -0000
***************
*** 27,32 ****
--- 27,33 ----

  volatile bool InterruptPending = false;
  volatile bool QueryCancelPending = false;
+ volatile bool BackendTerminatePending = false;
  volatile bool ProcDiePending = false;
  volatile bool ImmediateInterruptOK = false;
  volatile uint32 InterruptHoldoffCount = 0;
Index: src/include/miscadmin.h
===================================================================
RCS file: /cvsroot/pgsql/src/include/miscadmin.h,v
retrieving revision 1.201
diff -c -c -r1.201 miscadmin.h
*** src/include/miscadmin.h    20 Feb 2008 22:46:24 -0000    1.201
--- src/include/miscadmin.h    12 Apr 2008 13:48:33 -0000
***************
*** 68,73 ****
--- 68,74 ----
  /* these are marked volatile because they are set by signal handlers: */
  extern PGDLLIMPORT volatile bool InterruptPending;
  extern volatile bool QueryCancelPending;
+ extern volatile bool BackendTerminatePending;
  extern volatile bool ProcDiePending;

  /* these are marked volatile because they are examined by signal handlers: */
Index: src/include/catalog/pg_proc.h
===================================================================
RCS file: /cvsroot/pgsql/src/include/catalog/pg_proc.h,v
retrieving revision 1.488
diff -c -c -r1.488 pg_proc.h
*** src/include/catalog/pg_proc.h    10 Apr 2008 22:25:25 -0000    1.488
--- src/include/catalog/pg_proc.h    12 Apr 2008 13:48:33 -0000
***************
*** 3157,3162 ****
--- 3157,3164 ----

  DATA(insert OID = 2171 ( pg_cancel_backend        PGNSP PGUID 12 1 0 f f t f v 1 16 "23" _null_ _null_ _null_
pg_cancel_backend- _null_ _null_ )); 
  DESCR("cancel a server process' current query");
+ DATA(insert OID = 2096 ( pg_terminate_backend        PGNSP PGUID 12 1 0 f f t f v 1 16 "23" _null_ _null_ _null_
pg_terminate_backend- _null_ _null_ )); 
+ DESCR("terminate a server process");
  DATA(insert OID = 2172 ( pg_start_backup        PGNSP PGUID 12 1 0 f f t f v 1 25 "25" _null_ _null_ _null_
pg_start_backup- _null_ _null_ )); 
  DESCR("prepare for taking an online backup");
  DATA(insert OID = 2173 ( pg_stop_backup            PGNSP PGUID 12 1 0 f f t f v 0 25 "" _null_ _null_ _null_
pg_stop_backup- _null_ _null_ )); 
Index: src/include/tcop/tcopprot.h
===================================================================
RCS file: /cvsroot/pgsql/src/include/tcop/tcopprot.h,v
retrieving revision 1.93
diff -c -c -r1.93 tcopprot.h
*** src/include/tcop/tcopprot.h    10 Mar 2008 12:55:13 -0000    1.93
--- src/include/tcop/tcopprot.h    12 Apr 2008 13:48:33 -0000
***************
*** 60,65 ****
--- 60,66 ----
  extern void quickdie(SIGNAL_ARGS);
  extern void authdie(SIGNAL_ARGS);
  extern void StatementCancelHandler(SIGNAL_ARGS);
+ extern void BackendTerminateHandler(SIGNAL_ARGS);
  extern void FloatExceptionHandler(SIGNAL_ARGS);
  extern void prepare_for_client_read(void);
  extern void client_read_ended(void);
Index: src/include/utils/builtins.h
===================================================================
RCS file: /cvsroot/pgsql/src/include/utils/builtins.h,v
retrieving revision 1.312
diff -c -c -r1.312 builtins.h
*** src/include/utils/builtins.h    4 Apr 2008 18:45:36 -0000    1.312
--- src/include/utils/builtins.h    12 Apr 2008 13:48:33 -0000
***************
*** 416,421 ****
--- 416,422 ----
  extern Datum current_database(PG_FUNCTION_ARGS);
  extern Datum current_query(PG_FUNCTION_ARGS);
  extern Datum pg_cancel_backend(PG_FUNCTION_ARGS);
+ extern Datum pg_terminate_backend(PG_FUNCTION_ARGS);
  extern Datum pg_reload_conf(PG_FUNCTION_ARGS);
  extern Datum pg_tablespace_databases(PG_FUNCTION_ARGS);
  extern Datum pg_rotate_logfile(PG_FUNCTION_ARGS);

Re: [HACKERS] Terminating a backend

From
Bruce Momjian
Date:
Bruce Momjian wrote:
> Bruce Momjian wrote:
> > I have an idea for this TODO item:
> >
> >     * Allow administrators to safely terminate individual sessions either
> >       via an SQL function or SIGTERM
> >
> >       Lock table corruption following SIGTERM of an individual backend
> >       has been reported in 8.0.  A possible cause was fixed in 8.1, but
> >       it is unknown whether other problems exist.  This item mostly
> >       requires additional testing rather than of writing any new code.
> >
> >       http://archives.postgresql.org/pgsql-hackers/2006-08/msg00174.php
> >
> > When we get the termination signal, why can't we just set a global
> > boolean, do a query cancel, and in the setjmp() code block check the
> > global and exit --- at that stage we know we have released all locks and
> > can exit cleanly.
>
> I have implemented this idea with the attached patch.

One problem I have with my patch is that SIGTERM is currently used by
the postmaster to shut down backends.  Now because the postmaster knows
that all backend are terminating, it can accept a dirtier shutdown than
one where we are terminating just one backend and the rest are going to
keep running.  The new SIGTERM coding is going to exit a backend only in
a place where cancel is checked.

I need some thoughts in this area.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: [HACKERS] Terminating a backend

From
Tom Lane
Date:
Bruce Momjian <bruce@momjian.us> writes:
>> When we get the termination signal, why can't we just set a global
>> boolean, do a query cancel, and in the setjmp() code block check the
>> global and exit --- at that stage we know we have released all locks and
>> can exit cleanly.

> I have implemented this idea with the attached patch.

It was already explained to you why this is a bad idea.

            regards, tom lane

Re: [HACKERS] Terminating a backend

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> >> When we get the termination signal, why can't we just set a global
> >> boolean, do a query cancel, and in the setjmp() code block check the
> >> global and exit --- at that stage we know we have released all locks and
> >> can exit cleanly.
>
> > I have implemented this idea with the attached patch.
>
> It was already explained to you why this is a bad idea.

Yes, I remember your comments:

    http://archives.postgresql.org/pgsql-hackers/2008-03/msg00145.php

    Keep in mind that 99% of the excuse for people to want to use SIGTERM is
    that the backend isn't responding to SIGINT.  If you've fixed things so
    that SIGTERM cannot get them out of any situation that SIGINT doesn't
    get them out of, I don't think it's a step forward.

    ...

    [shrug...]  They can do that now, most of the time.  What this is about
    is dealing with corner cases, and in that respect what your proposal
    will do is replace soluble problems with insoluble ones.  But I suppose
    I can't stop you if you're insistent.

I need more than one person to tell me it is a bad idea because I think
it is a good idea.

If we tell people SIGTERM is not safe to use then why is making it safe
worse.  And if it is safe, we can just add a function to enable SIGTERM
to the backend without doing the query cancel longjump().

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: [HACKERS] Terminating a backend

From
Bruce Momjian
Date:
Bruce Momjian wrote:
> > > When we get the termination signal, why can't we just set a global
> > > boolean, do a query cancel, and in the setjmp() code block check the
> > > global and exit --- at that stage we know we have released all locks and
> > > can exit cleanly.
> >
> > I have implemented this idea with the attached patch.
>
> One problem I have with my patch is that SIGTERM is currently used by
> the postmaster to shut down backends.  Now because the postmaster knows
> that all backend are terminating, it can accept a dirtier shutdown than
> one where we are terminating just one backend and the rest are going to
> keep running.  The new SIGTERM coding is going to exit a backend only in
> a place where cancel is checked.

I have a idea --- to have pg_terminate_backend() set a PGPROC boolean
and then send a query cancel signal to the backend --- the backend can
then check the boolean and exit if required.  I will work on a new
version of this patch tomorrow/Monday.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: [HACKERS] Terminating a backend

From
Alvaro Herrera
Date:
Bruce Momjian wrote:

> I have a idea --- to have pg_terminate_backend() set a PGPROC boolean
> and then send a query cancel signal to the backend --- the backend can
> then check the boolean and exit if required.  I will work on a new
> version of this patch tomorrow/Monday.

That's fine, even if it fails on 0.1% of the cases, but you can't use
SIGTERM to implement that because it already has a different meaning.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Terminating a backend

From
Bruce Momjian
Date:
oikBruce Momjian wrote:
> Bruce Momjian wrote:
> > > > When we get the termination signal, why can't we just set a global
> > > > boolean, do a query cancel, and in the setjmp() code block check the
> > > > global and exit --- at that stage we know we have released all locks and
> > > > can exit cleanly.
> > >
> > > I have implemented this idea with the attached patch.
> >
> > One problem I have with my patch is that SIGTERM is currently used by
> > the postmaster to shut down backends.  Now because the postmaster knows
> > that all backend are terminating, it can accept a dirtier shutdown than
> > one where we are terminating just one backend and the rest are going to
> > keep running.  The new SIGTERM coding is going to exit a backend only in
> > a place where cancel is checked.
>
> I have a idea --- to have pg_terminate_backend() set a PGPROC boolean
> and then send a query cancel signal to the backend --- the backend can
> then check the boolean and exit if required.  I will work on a new
> version of this patch tomorrow/Monday.

Updated patch attached.  I didn't modify SIGTERM at all but set a PRPROC
boolean and piggybacked on SIGINT.  I think I got the PGPROC locking
right.  I had to split apart pg_signal_backend() so I could do the
permission and pid checks independent of the signalling, because I
pg_terminate_backend() needs to check, then set the PGPROC variable,
then send the signal.

I also added an administration doc mention about when to use
pg_terminate_backend().

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +
Index: doc/src/sgml/func.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/func.sgml,v
retrieving revision 1.429
diff -c -c -r1.429 func.sgml
*** doc/src/sgml/func.sgml    10 Apr 2008 13:34:33 -0000    1.429
--- doc/src/sgml/func.sgml    14 Apr 2008 17:19:10 -0000
***************
*** 11849,11854 ****
--- 11849,11857 ----
      <primary>pg_cancel_backend</primary>
     </indexterm>
     <indexterm>
+     <primary>pg_terminate_backend</primary>
+    </indexterm>
+    <indexterm>
      <primary>pg_reload_conf</primary>
     </indexterm>
     <indexterm>
***************
*** 11885,11890 ****
--- 11888,11900 ----
        </row>
        <row>
         <entry>
+         <literal><function>pg_terminate_backend</function>(<parameter>pid</parameter> <type>int</>)</literal>
+         </entry>
+        <entry><type>boolean</type></entry>
+        <entry>Terminate a backend</entry>
+       </row>
+       <row>
+        <entry>
          <literal><function>pg_reload_conf</function>()</literal>
          </entry>
         <entry><type>boolean</type></entry>
***************
*** 11907,11915 ****
     </para>

     <para>
!     <function>pg_cancel_backend</> sends a query cancel
!     (<systemitem>SIGINT</>) signal to a backend process identified by
!     process ID.  The process ID of an active backend can be found from
      the <structfield>procpid</structfield> column in the
      <structname>pg_stat_activity</structname> view, or by listing the
      <command>postgres</command> processes on the server with
--- 11917,11926 ----
     </para>

     <para>
!     <function>pg_cancel_backend</> and <function>pg_terminate_backend</>
!     send a query cancel (<systemitem>SIGINT</>) signal to a backend process
!     identified by process ID.  The
!     process ID of an active backend can be found from
      the <structfield>procpid</structfield> column in the
      <structname>pg_stat_activity</structname> view, or by listing the
      <command>postgres</command> processes on the server with
Index: doc/src/sgml/runtime.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/runtime.sgml,v
retrieving revision 1.411
diff -c -c -r1.411 runtime.sgml
*** doc/src/sgml/runtime.sgml    31 Mar 2008 02:43:14 -0000    1.411
--- doc/src/sgml/runtime.sgml    14 Apr 2008 17:19:10 -0000
***************
*** 1372,1377 ****
--- 1372,1384 ----
      well.
     </para>
    </important>
+
+   <para>
+    To terminate a session while allowing other sessions to continue, use
+    <function>pg_terminate_backend()</> (<xref
+    linkend="functions-admin-signal-table">) rather than sending a signal
+    to the child process.
+   </para>
   </sect1>

   <sect1 id="preventing-server-spoofing">
Index: src/backend/tcop/postgres.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/tcop/postgres.c,v
retrieving revision 1.548
diff -c -c -r1.548 postgres.c
*** src/backend/tcop/postgres.c    2 Apr 2008 18:31:50 -0000    1.548
--- src/backend/tcop/postgres.c    14 Apr 2008 17:19:10 -0000
***************
*** 2541,2547 ****
           * waiting for input, however.
           */
          if (ImmediateInterruptOK && InterruptHoldoffCount == 0 &&
!             CritSectionCount == 0 && !DoingCommandRead)
          {
              /* bump holdoff count to make ProcessInterrupts() a no-op */
              /* until we are done getting ready for it */
--- 2541,2548 ----
           * waiting for input, however.
           */
          if (ImmediateInterruptOK && InterruptHoldoffCount == 0 &&
!             CritSectionCount == 0 &&
!             (!DoingCommandRead || MyProc->terminate))
          {
              /* bump holdoff count to make ProcessInterrupts() a no-op */
              /* until we are done getting ready for it */
***************
*** 2621,2626 ****
--- 2622,2631 ----
              ereport(ERROR,
                      (errcode(ERRCODE_QUERY_CANCELED),
                       errmsg("canceling autovacuum task")));
+         else if (MyProc->terminate)
+             ereport(ERROR,
+                     (errcode(ERRCODE_ADMIN_SHUTDOWN),
+                      errmsg("terminating backend due to administrator command")));
          else
              ereport(ERROR,
                      (errcode(ERRCODE_QUERY_CANCELED),
***************
*** 3459,3464 ****
--- 3464,3472 ----
          /* We don't have a transaction command open anymore */
          xact_started = false;

+         if (MyProc->terminate)
+             die(SIGINT);
+
          /* Now we can allow interrupts again */
          RESUME_INTERRUPTS();
      }
Index: src/backend/utils/adt/misc.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/utils/adt/misc.c,v
retrieving revision 1.59
diff -c -c -r1.59 misc.c
*** src/backend/utils/adt/misc.c    4 Apr 2008 16:57:21 -0000    1.59
--- src/backend/utils/adt/misc.c    14 Apr 2008 17:19:11 -0000
***************
*** 27,32 ****
--- 27,33 ----
  #include "postmaster/syslogger.h"
  #include "storage/fd.h"
  #include "storage/pmsignal.h"
+ #include "storage/proc.h"
  #include "storage/procarray.h"
  #include "utils/builtins.h"
  #include "tcop/tcopprot.h"
***************
*** 89,95 ****
   * Functions to send signals to other backends.
   */
  static bool
! pg_signal_backend(int pid, int sig)
  {
      if (!superuser())
          ereport(ERROR,
--- 90,96 ----
   * Functions to send signals to other backends.
   */
  static bool
! pg_signal_check(int pid)
  {
      if (!superuser())
          ereport(ERROR,
***************
*** 106,112 ****
--- 107,122 ----
                  (errmsg("PID %d is not a PostgreSQL server process", pid)));
          return false;
      }
+     else
+         return true;
+ }

+ /*
+  * Functions to send signals to other backends.
+  */
+ static bool
+ pg_signal_backend(int pid, int sig)
+ {
      /* If we have setsid(), signal the backend's whole process group */
  #ifdef HAVE_SETSID
      if (kill(-pid, sig))
***************
*** 125,131 ****
  Datum
  pg_cancel_backend(PG_FUNCTION_ARGS)
  {
!     PG_RETURN_BOOL(pg_signal_backend(PG_GETARG_INT32(0), SIGINT));
  }

  Datum
--- 135,177 ----
  Datum
  pg_cancel_backend(PG_FUNCTION_ARGS)
  {
!     int pid = PG_GETARG_INT32(0);
!
!     if (pg_signal_check(pid))
!         PG_RETURN_BOOL(pg_signal_backend(pid, SIGINT));
!     else
!         PG_RETURN_BOOL(false);
! }
!
! /*
!  *    To cleanly terminate a backend, we set PGPROC(pid)->terminate
!  *    then send a cancel signal.  We get ProcArrayLock only when
!  *    setting PGPROC->terminate so the function might fail in
!  *    several places, but that is fine because in those cases the
!  *    backend is already gone.
!  */
! Datum
! pg_terminate_backend(PG_FUNCTION_ARGS)
! {
!     int pid = PG_GETARG_INT32(0);
!     volatile PGPROC *term_proc;
!
!     /* Is this the super-user, and can we find the PGPROC entry for the pid? */
!     if (pg_signal_check(pid) && (term_proc = BackendPidGetProc(pid)) != NULL)
!     {
!         LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
!         /* Recheck now that we have the ProcArray lock. */
!         if (term_proc->pid == pid)
!         {
!             term_proc->terminate = true;
!             LWLockRelease(ProcArrayLock);
!             PG_RETURN_BOOL(pg_signal_backend(pid, SIGINT));
!         }
!         else
!             LWLockRelease(ProcArrayLock);
!     }
!
!     PG_RETURN_BOOL(false);
  }

  Datum
***************
*** 169,185 ****
      PG_RETURN_BOOL(true);
  }

- #ifdef NOT_USED
-
- /* Disabled in 8.0 due to reliability concerns; FIXME someday */
- Datum
- pg_terminate_backend(PG_FUNCTION_ARGS)
- {
-     PG_RETURN_INT32(pg_signal_backend(PG_GETARG_INT32(0), SIGTERM));
- }
- #endif
-
-
  /* Function to find out which databases make use of a tablespace */

  typedef struct
--- 215,220 ----
Index: src/include/catalog/pg_proc.h
===================================================================
RCS file: /cvsroot/pgsql/src/include/catalog/pg_proc.h,v
retrieving revision 1.488
diff -c -c -r1.488 pg_proc.h
*** src/include/catalog/pg_proc.h    10 Apr 2008 22:25:25 -0000    1.488
--- src/include/catalog/pg_proc.h    14 Apr 2008 17:19:11 -0000
***************
*** 3157,3162 ****
--- 3157,3164 ----

  DATA(insert OID = 2171 ( pg_cancel_backend        PGNSP PGUID 12 1 0 f f t f v 1 16 "23" _null_ _null_ _null_
pg_cancel_backend- _null_ _null_ )); 
  DESCR("cancel a server process' current query");
+ DATA(insert OID = 2096 ( pg_terminate_backend        PGNSP PGUID 12 1 0 f f t f v 1 16 "23" _null_ _null_ _null_
pg_terminate_backend- _null_ _null_ )); 
+ DESCR("terminate a server process");
  DATA(insert OID = 2172 ( pg_start_backup        PGNSP PGUID 12 1 0 f f t f v 1 25 "25" _null_ _null_ _null_
pg_start_backup- _null_ _null_ )); 
  DESCR("prepare for taking an online backup");
  DATA(insert OID = 2173 ( pg_stop_backup            PGNSP PGUID 12 1 0 f f t f v 0 25 "" _null_ _null_ _null_
pg_stop_backup- _null_ _null_ )); 
Index: src/include/storage/proc.h
===================================================================
RCS file: /cvsroot/pgsql/src/include/storage/proc.h,v
retrieving revision 1.104
diff -c -c -r1.104 proc.h
*** src/include/storage/proc.h    26 Jan 2008 19:55:08 -0000    1.104
--- src/include/storage/proc.h    14 Apr 2008 17:19:11 -0000
***************
*** 91,96 ****
--- 91,98 ----

      bool        inCommit;        /* true if within commit critical section */

+     bool        terminate;        /* admin requested termination */
+
      uint8        vacuumFlags;    /* vacuum-related flags, see above */

      /* Info about LWLock the process is currently waiting for, if any. */
Index: src/include/utils/builtins.h
===================================================================
RCS file: /cvsroot/pgsql/src/include/utils/builtins.h,v
retrieving revision 1.312
diff -c -c -r1.312 builtins.h
*** src/include/utils/builtins.h    4 Apr 2008 18:45:36 -0000    1.312
--- src/include/utils/builtins.h    14 Apr 2008 17:19:11 -0000
***************
*** 416,421 ****
--- 416,422 ----
  extern Datum current_database(PG_FUNCTION_ARGS);
  extern Datum current_query(PG_FUNCTION_ARGS);
  extern Datum pg_cancel_backend(PG_FUNCTION_ARGS);
+ extern Datum pg_terminate_backend(PG_FUNCTION_ARGS);
  extern Datum pg_reload_conf(PG_FUNCTION_ARGS);
  extern Datum pg_tablespace_databases(PG_FUNCTION_ARGS);
  extern Datum pg_rotate_logfile(PG_FUNCTION_ARGS);

Re: Terminating a backend

From
Bruce Momjian
Date:
Applied.

---------------------------------------------------------------------------

Bruce Momjian wrote:
> oikBruce Momjian wrote:
> > Bruce Momjian wrote:
> > > > > When we get the termination signal, why can't we just set a global
> > > > > boolean, do a query cancel, and in the setjmp() code block check the
> > > > > global and exit --- at that stage we know we have released all locks and
> > > > > can exit cleanly.
> > > >
> > > > I have implemented this idea with the attached patch.
> > >
> > > One problem I have with my patch is that SIGTERM is currently used by
> > > the postmaster to shut down backends.  Now because the postmaster knows
> > > that all backend are terminating, it can accept a dirtier shutdown than
> > > one where we are terminating just one backend and the rest are going to
> > > keep running.  The new SIGTERM coding is going to exit a backend only in
> > > a place where cancel is checked.
> >
> > I have a idea --- to have pg_terminate_backend() set a PGPROC boolean
> > and then send a query cancel signal to the backend --- the backend can
> > then check the boolean and exit if required.  I will work on a new
> > version of this patch tomorrow/Monday.
>
> Updated patch attached.  I didn't modify SIGTERM at all but set a PRPROC
> boolean and piggybacked on SIGINT.  I think I got the PGPROC locking
> right.  I had to split apart pg_signal_backend() so I could do the
> permission and pid checks independent of the signalling, because I
> pg_terminate_backend() needs to check, then set the PGPROC variable,
> then send the signal.
>
> I also added an administration doc mention about when to use
> pg_terminate_backend().
>
> --
>   Bruce Momjian  <bruce@momjian.us>        http://momjian.us
>   EnterpriseDB                             http://enterprisedb.com
>
>   + If your life is a hard drive, Christ can be your backup. +


>
> --
> Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-patches

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +