Re: pg_terminate_backend() issues - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: pg_terminate_backend() issues
Date
Msg-id 200804160409.m3G49h104438@momjian.us
Whole thread Raw
In response to Re: pg_terminate_backend() issues  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > Tom Lane wrote:
> >> I was able to get things to more or less work most of the time with
> >> three or four additional ugly hacks in postgres.c, but I still don't
> >> have any great confidence that there aren't windows where a terminate
> >> request wouldn't be ignored (even without considering the uncooperative-
> >> function scenario).  Moreover it's entirely unclear that this approach
> >> actually dodges any of the hypothetical bugs in SIGTERM handling.
>
> > I don't understand.  If we call proc_exit(0) instead, it is the same as
> > someone exiting the backend via PQfinish().
>
> Well, using proc_exit() instead of die() might have alleviated that
> particular objection, but there are still the others.

Tom, thanks for you feedback.  It was very helpful.

I have adjusted the patch to return a locked PGPROC entry, initialize
the PGPROC->terminate boolean, called proc_exit(), and moved the
PGPROC->terminate test out of the cancel loop entirely so lost cancel
signals are not as big a problem anymore.

I agree it would be best for pg_terminate_backend() and for the
postmaster use of SIGTERM if SIGTERM was more reliable about cleanup, so
hopefully that will work for 8.4.  I have attached my patch in hopes we
can use it as a backup in case we can't get SIGTERM to work for
individual backends for 8.4.

[ I have posted about testing in a separate email.]

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +
Index: doc/src/sgml/func.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/func.sgml,v
retrieving revision 1.432
diff -c -c -r1.432 func.sgml
*** doc/src/sgml/func.sgml    15 Apr 2008 20:28:46 -0000    1.432
--- doc/src/sgml/func.sgml    16 Apr 2008 03:59:20 -0000
***************
*** 11849,11854 ****
--- 11849,11857 ----
      <primary>pg_cancel_backend</primary>
     </indexterm>
     <indexterm>
+     <primary>pg_terminate_backend</primary>
+    </indexterm>
+    <indexterm>
      <primary>pg_reload_conf</primary>
     </indexterm>
     <indexterm>
***************
*** 11885,11890 ****
--- 11888,11900 ----
        </row>
        <row>
         <entry>
+         <literal><function>pg_terminate_backend</function>(<parameter>pid</parameter> <type>int</>)</literal>
+         </entry>
+        <entry><type>boolean</type></entry>
+        <entry>Terminate a backend</entry>
+       </row>
+       <row>
+        <entry>
          <literal><function>pg_reload_conf</function>()</literal>
          </entry>
         <entry><type>boolean</type></entry>
***************
*** 11907,11915 ****
     </para>

     <para>
!     <function>pg_cancel_backend</> sends a query cancel
!     (<systemitem>SIGINT</>) signal to a backend process identified by
!     process ID.  The process ID of an active backend can be found from
      the <structfield>procpid</structfield> column in the
      <structname>pg_stat_activity</structname> view, or by listing the
      <command>postgres</command> processes on the server with
--- 11917,11926 ----
     </para>

     <para>
!     <function>pg_cancel_backend</> and <function>pg_terminate_backend</>
!     send a query cancel (<systemitem>SIGINT</>) signal to a backend process
!     identified by process ID.  The
!     process ID of an active backend can be found from
      the <structfield>procpid</structfield> column in the
      <structname>pg_stat_activity</structname> view, or by listing the
      <command>postgres</command> processes on the server with
Index: doc/src/sgml/runtime.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/runtime.sgml,v
retrieving revision 1.413
diff -c -c -r1.413 runtime.sgml
*** doc/src/sgml/runtime.sgml    15 Apr 2008 20:28:46 -0000    1.413
--- doc/src/sgml/runtime.sgml    16 Apr 2008 03:59:20 -0000
***************
*** 1372,1377 ****
--- 1372,1384 ----
      well.
     </para>
    </important>
+
+   <para>
+    To terminate a session while allowing other sessions to continue, use
+    <function>pg_terminate_backend()</> (<xref
+    linkend="functions-admin-signal-table">) rather than sending a signal
+    to the child process.
+   </para>
   </sect1>

   <sect1 id="preventing-server-spoofing">
Index: src/backend/access/transam/twophase.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/access/transam/twophase.c,v
retrieving revision 1.41
diff -c -c -r1.41 twophase.c
*** src/backend/access/transam/twophase.c    25 Mar 2008 22:42:42 -0000    1.41
--- src/backend/access/transam/twophase.c    16 Apr 2008 03:59:20 -0000
***************
*** 283,288 ****
--- 283,289 ----
      gxact->proc.databaseId = databaseid;
      gxact->proc.roleId = owner;
      gxact->proc.inCommit = false;
+     gxact->proc.terminate = false;
      gxact->proc.vacuumFlags = 0;
      gxact->proc.lwWaiting = false;
      gxact->proc.lwExclusive = false;
Index: src/backend/storage/ipc/procarray.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/storage/ipc/procarray.c,v
retrieving revision 1.43
diff -c -c -r1.43 procarray.c
*** src/backend/storage/ipc/procarray.c    26 Mar 2008 18:48:59 -0000    1.43
--- src/backend/storage/ipc/procarray.c    16 Apr 2008 03:59:20 -0000
***************
*** 935,941 ****
   * answer to be used ...
   */
  PGPROC *
! BackendPidGetProc(int pid)
  {
      PGPROC       *result = NULL;
      ProcArrayStruct *arrayP = procArray;
--- 935,941 ----
   * answer to be used ...
   */
  PGPROC *
! BackendPidGetProc(int pid, bool want_shlock)
  {
      PGPROC       *result = NULL;
      ProcArrayStruct *arrayP = procArray;
***************
*** 957,964 ****
          }
      }

!     LWLockRelease(ProcArrayLock);
!
      return result;
  }

--- 957,966 ----
          }
      }

!     /* Do they want the proc returned with an shared lock? */
!     if (!result || !want_shlock)
!         LWLockRelease(ProcArrayLock);
!
      return result;
  }

***************
*** 1009,1015 ****
  bool
  IsBackendPid(int pid)
  {
!     return (BackendPidGetProc(pid) != NULL);
  }


--- 1011,1017 ----
  bool
  IsBackendPid(int pid)
  {
!     return (BackendPidGetProc(pid, false) != NULL);
  }


Index: src/backend/storage/lmgr/proc.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/storage/lmgr/proc.c,v
retrieving revision 1.199
diff -c -c -r1.199 proc.c
*** src/backend/storage/lmgr/proc.c    26 Jan 2008 19:55:08 -0000    1.199
--- src/backend/storage/lmgr/proc.c    16 Apr 2008 03:59:20 -0000
***************
*** 291,296 ****
--- 291,297 ----
      MyProc->databaseId = InvalidOid;
      MyProc->roleId = InvalidOid;
      MyProc->inCommit = false;
+     MyProc->terminate = false;
      MyProc->vacuumFlags = 0;
      if (IsAutoVacuumWorkerProcess())
          MyProc->vacuumFlags |= PROC_IS_AUTOVACUUM;
***************
*** 430,435 ****
--- 431,437 ----
      MyProc->databaseId = InvalidOid;
      MyProc->roleId = InvalidOid;
      MyProc->inCommit = false;
+     MyProc->terminate = false;
      /* we don't set the "is autovacuum" flag in the launcher */
      MyProc->vacuumFlags = 0;
      MyProc->lwWaiting = false;
***************
*** 1275,1281 ****
  void
  ProcSendSignal(int pid)
  {
!     PGPROC       *proc = BackendPidGetProc(pid);

      if (proc != NULL)
          PGSemaphoreUnlock(&proc->sem);
--- 1277,1283 ----
  void
  ProcSendSignal(int pid)
  {
!     PGPROC       *proc = BackendPidGetProc(pid, false);

      if (proc != NULL)
          PGSemaphoreUnlock(&proc->sem);
Index: src/backend/tcop/postgres.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/tcop/postgres.c,v
retrieving revision 1.550
diff -c -c -r1.550 postgres.c
*** src/backend/tcop/postgres.c    15 Apr 2008 20:28:46 -0000    1.550
--- src/backend/tcop/postgres.c    16 Apr 2008 03:59:23 -0000
***************
*** 2541,2547 ****
           * waiting for input, however.
           */
          if (ImmediateInterruptOK && InterruptHoldoffCount == 0 &&
!             CritSectionCount == 0 && !DoingCommandRead)
          {
              /* bump holdoff count to make ProcessInterrupts() a no-op */
              /* until we are done getting ready for it */
--- 2541,2548 ----
           * waiting for input, however.
           */
          if (ImmediateInterruptOK && InterruptHoldoffCount == 0 &&
!             CritSectionCount == 0 &&
!             (!DoingCommandRead || MyProc->terminate))
          {
              /* bump holdoff count to make ProcessInterrupts() a no-op */
              /* until we are done getting ready for it */
***************
*** 2621,2626 ****
--- 2622,2631 ----
              ereport(ERROR,
                      (errcode(ERRCODE_QUERY_CANCELED),
                       errmsg("canceling autovacuum task")));
+         else if (MyProc->terminate)
+             ereport(ERROR,
+                     (errcode(ERRCODE_ADMIN_SHUTDOWN),
+                      errmsg("terminating backend due to administrator command")));
          else
              ereport(ERROR,
                      (errcode(ERRCODE_QUERY_CANCELED),
***************
*** 3492,3497 ****
--- 3497,3505 ----

          initStringInfo(&input_message);

+         if (MyProc->terminate)
+             proc_exit(0);
+
          /*
           * (1) If we've reached idle state, tell the frontend we're ready for
           * a new query.
Index: src/backend/utils/adt/misc.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/utils/adt/misc.c,v
retrieving revision 1.61
diff -c -c -r1.61 misc.c
*** src/backend/utils/adt/misc.c    15 Apr 2008 20:28:46 -0000    1.61
--- src/backend/utils/adt/misc.c    16 Apr 2008 03:59:23 -0000
***************
*** 27,32 ****
--- 27,33 ----
  #include "postmaster/syslogger.h"
  #include "storage/fd.h"
  #include "storage/pmsignal.h"
+ #include "storage/proc.h"
  #include "storage/procarray.h"
  #include "utils/builtins.h"
  #include "tcop/tcopprot.h"
***************
*** 89,95 ****
   * Functions to send signals to other backends.
   */
  static bool
! pg_signal_backend(int pid, int sig)
  {
      if (!superuser())
          ereport(ERROR,
--- 90,96 ----
   * Functions to send signals to other backends.
   */
  static bool
! pg_signal_check(int pid)
  {
      if (!superuser())
          ereport(ERROR,
***************
*** 106,112 ****
--- 107,122 ----
                  (errmsg("PID %d is not a PostgreSQL server process", pid)));
          return false;
      }
+     else
+         return true;
+ }

+ /*
+  * Functions to send signals to other backends.
+  */
+ static bool
+ pg_signal_backend(int pid, int sig)
+ {
      /* If we have setsid(), signal the backend's whole process group */
  #ifdef HAVE_SETSID
      if (kill(-pid, sig))
***************
*** 125,131 ****
  Datum
  pg_cancel_backend(PG_FUNCTION_ARGS)
  {
!     PG_RETURN_BOOL(pg_signal_backend(PG_GETARG_INT32(0), SIGINT));
  }

  Datum
--- 135,168 ----
  Datum
  pg_cancel_backend(PG_FUNCTION_ARGS)
  {
!     int pid = PG_GETARG_INT32(0);
!
!     if (pg_signal_check(pid))
!         PG_RETURN_BOOL(pg_signal_backend(pid, SIGINT));
!     else
!         PG_RETURN_BOOL(false);
! }
!
! /*
!  *    To cleanly terminate a backend, we set PGPROC(pid)->terminate
!  *    then send a cancel signal.  SIGTERM isn't 100% safe for
!  *    cases where other backend will continue to run.
!  */
! Datum
! pg_terminate_backend(PG_FUNCTION_ARGS)
! {
!     int pid = PG_GETARG_INT32(0);
!     volatile PGPROC *term_proc;
!
!     /* Is this the super-user, and can we find the PGPROC entry for the pid? */
!     if (pg_signal_check(pid) && (term_proc = BackendPidGetProc(pid, true)) != NULL)
!     {
!         term_proc->terminate = true;
!         LWLockRelease(ProcArrayLock);
!         PG_RETURN_BOOL(pg_signal_backend(pid, SIGINT));
!     }
!
!     PG_RETURN_BOOL(false);
  }

  Datum
***************
*** 169,185 ****
      PG_RETURN_BOOL(true);
  }

- #ifdef NOT_USED
-
- /* Disabled in 8.0 due to reliability concerns; FIXME someday */
- Datum
- pg_terminate_backend(PG_FUNCTION_ARGS)
- {
-     PG_RETURN_INT32(pg_signal_backend(PG_GETARG_INT32(0), SIGTERM));
- }
- #endif
-
-
  /* Function to find out which databases make use of a tablespace */

  typedef struct
--- 206,211 ----
Index: src/include/catalog/pg_proc.h
===================================================================
RCS file: /cvsroot/pgsql/src/include/catalog/pg_proc.h,v
retrieving revision 1.491
diff -c -c -r1.491 pg_proc.h
*** src/include/catalog/pg_proc.h    15 Apr 2008 20:28:46 -0000    1.491
--- src/include/catalog/pg_proc.h    16 Apr 2008 03:59:23 -0000
***************
*** 3157,3162 ****
--- 3157,3164 ----

  DATA(insert OID = 2171 ( pg_cancel_backend        PGNSP PGUID 12 1 0 f f t f v 1 16 "23" _null_ _null_ _null_
pg_cancel_backend- _null_ _null_ )); 
  DESCR("cancel a server process' current query");
+ DATA(insert OID = 2096 ( pg_terminate_backend        PGNSP PGUID 12 1 0 f f t f v 1 16 "23" _null_ _null_ _null_
pg_terminate_backend- _null_ _null_ )); 
+ DESCR("terminate a server process");
  DATA(insert OID = 2172 ( pg_start_backup        PGNSP PGUID 12 1 0 f f t f v 1 25 "25" _null_ _null_ _null_
pg_start_backup- _null_ _null_ )); 
  DESCR("prepare for taking an online backup");
  DATA(insert OID = 2173 ( pg_stop_backup            PGNSP PGUID 12 1 0 f f t f v 0 25 "" _null_ _null_ _null_
pg_stop_backup- _null_ _null_ )); 
Index: src/include/storage/proc.h
===================================================================
RCS file: /cvsroot/pgsql/src/include/storage/proc.h,v
retrieving revision 1.106
diff -c -c -r1.106 proc.h
*** src/include/storage/proc.h    15 Apr 2008 20:28:47 -0000    1.106
--- src/include/storage/proc.h    16 Apr 2008 03:59:25 -0000
***************
*** 91,96 ****
--- 91,98 ----

      bool        inCommit;        /* true if within commit critical section */

+     bool        terminate;        /* admin requested termination */
+
      uint8        vacuumFlags;    /* vacuum-related flags, see above */

      /* Info about LWLock the process is currently waiting for, if any. */
Index: src/include/storage/procarray.h
===================================================================
RCS file: /cvsroot/pgsql/src/include/storage/procarray.h,v
retrieving revision 1.21
diff -c -c -r1.21 procarray.h
*** src/include/storage/procarray.h    26 Mar 2008 16:20:48 -0000    1.21
--- src/include/storage/procarray.h    16 Apr 2008 03:59:25 -0000
***************
*** 35,41 ****
  extern int    GetTransactionsInCommit(TransactionId **xids_p);
  extern bool HaveTransactionsInCommit(TransactionId *xids, int nxids);

! extern PGPROC *BackendPidGetProc(int pid);
  extern int    BackendXidGetPid(TransactionId xid);
  extern bool IsBackendPid(int pid);

--- 35,41 ----
  extern int    GetTransactionsInCommit(TransactionId **xids_p);
  extern bool HaveTransactionsInCommit(TransactionId *xids, int nxids);

! extern PGPROC *BackendPidGetProc(int pid, bool shlock);
  extern int    BackendXidGetPid(TransactionId xid);
  extern bool IsBackendPid(int pid);

Index: src/include/utils/builtins.h
===================================================================
RCS file: /cvsroot/pgsql/src/include/utils/builtins.h,v
retrieving revision 1.314
diff -c -c -r1.314 builtins.h
*** src/include/utils/builtins.h    15 Apr 2008 20:28:47 -0000    1.314
--- src/include/utils/builtins.h    16 Apr 2008 03:59:25 -0000
***************
*** 416,421 ****
--- 416,422 ----
  extern Datum current_database(PG_FUNCTION_ARGS);
  extern Datum current_query(PG_FUNCTION_ARGS);
  extern Datum pg_cancel_backend(PG_FUNCTION_ARGS);
+ extern Datum pg_terminate_backend(PG_FUNCTION_ARGS);
  extern Datum pg_reload_conf(PG_FUNCTION_ARGS);
  extern Datum pg_tablespace_databases(PG_FUNCTION_ARGS);
  extern Datum pg_rotate_logfile(PG_FUNCTION_ARGS);

pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: pg_terminate_backend() issues
Next
From: Tom Lane
Date:
Subject: Re: pg_terminate_backend() issues