Thread: How to know killed by pg_terminate_backend
Hi, If a backend killed by pg_terminate_backend(), the backend returns 57P01 which is identical to the one when it's killed by postmaster. Problem is, pgpool-II needs to trigger failover if postmaster goes down because apparently pgpool-II cannot use the PostgreSQL server anymore. On the otherhand, pg_terminate_backend() just terminates a backend. So triggering failover is overkill. Maybe we could make PostgreSQL a little bit smarter so that it returns a different code than 57P01 when killed by pg_terminate_backend(). Comments? -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp
Tatsuo Ishii wrote: > If a backend killed by pg_terminate_backend(), the backend returns > 57P01 which is identical to the one when it's killed by postmaster. > > Problem is, pgpool-II needs to trigger failover if postmaster goes > down because apparently pgpool-II cannot use the PostgreSQL server > anymore. > > On the otherhand, pg_terminate_backend() just terminates a backend. So > triggering failover is overkill. > > Maybe we could make PostgreSQL a little bit smarter so that it returns > a different code than 57P01 when killed by pg_terminate_backend(). Seems reasonable. Does the victim backend currently know why it has been killed? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
> > Maybe we could make PostgreSQL a little bit smarter so that it returns > > a different code than 57P01 when killed by pg_terminate_backend(). > > Seems reasonable. Does the victim backend currently know why it has been > killed? I don't think so. One idea is postmaster sets a flag in the shared memory area indicating it rceived SIGTERM before forwarding the signal to backends. Backend check the flag and if it's not set, it knows that the signal has been sent by pg_terminate_backend(), not postmaster. What about new error code: #define ERRCODE_BACKEND_STOP_REQUEST MAKE_SQLSTATE('5','7', 'P','0','4') -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp
On Thu, May 13, 2010 at 8:20 PM, Tatsuo Ishii <ishii@postgresql.org> wrote: >> > Maybe we could make PostgreSQL a little bit smarter so that it returns >> > a different code than 57P01 when killed by pg_terminate_backend(). >> >> Seems reasonable. Does the victim backend currently know why it has been >> killed? > > I don't think so. > > One idea is postmaster sets a flag in the shared memory area > indicating it rceived SIGTERM before forwarding the signal to > backends. > > Backend check the flag and if it's not set, it knows that the signal > has been sent by pg_terminate_backend(), not postmaster. Or it could also be sent by some other user process, like the user running "kill" from the shell. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
> >> Seems reasonable. Does the victim backend currently know why it has been > >> killed? > > > > I don't think so. > > > > One idea is postmaster sets a flag in the shared memory area > > indicating it rceived SIGTERM before forwarding the signal to > > backends. > > > > Backend check the flag and if it's not set, it knows that the signal > > has been sent by pg_terminate_backend(), not postmaster. > > Or it could also be sent by some other user process, like the user > running "kill" from the shell. No problem (at least for pgpool-II). If the flag is not set, postgres returns the same code as the one killed by pg_terminate_backend(). The point is, backend is killed by postmaster or not. Because if backend was killed by postmaster, pgpool-II should not expect the PostgreSQL server is usable since postmaster decided to shutdown. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp
>> >> Seems reasonable. Does the victim backend currently know why it has been >> >> killed? >> > >> > I don't think so. >> > >> > One idea is postmaster sets a flag in the shared memory area >> > indicating it rceived SIGTERM before forwarding the signal to >> > backends. >> > >> > Backend check the flag and if it's not set, it knows that the signal >> > has been sent by pg_terminate_backend(), not postmaster. >> >> Or it could also be sent by some other user process, like the user >> running "kill" from the shell. > > No problem (at least for pgpool-II). > > If the flag is not set, postgres returns the same code as the one > killed by pg_terminate_backend(). The point is, backend is killed by > postmaster or not. Because if backend was killed by postmaster, > pgpool-II should not expect the PostgreSQL server is usable since > postmaster decided to shutdown. Here is the patch to implement the feature. 1) pg_terminate_backend() sends SIGUSR1 signal rather than SIGTERM to the target backend. 2) The infrastructure used for message passing is storage/ipc/procsignal.c The new message type for ProcSignalReason is"PROCSIG_TERMNINATE_BACKEND_INTERRUPT"3) I assign new error code 57P04 which is returned from the backend killed bypg_terminate_backend(). #define ERRCODE_TERMINATE_BACKEND MAKE_SQLSTATE('5','7', 'P','0','4') Comments are welcome. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp *** a/src/backend/storage/ipc/procsignal.c --- b/src/backend/storage/ipc/procsignal.c *************** *** 279,284 **** procsignal_sigusr1_handler(SIGNAL_ARGS) --- 279,287 ---- if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN)) RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN); + if (CheckProcSignal(PROCSIG_TERMNINATE_BACKEND_INTERRUPT)) + HandleTerminateBackendInterrupt(); + latch_sigusr1_handler(); errno = save_errno; *** a/src/backend/tcop/postgres.c --- b/src/backend/tcop/postgres.c *************** *** 184,189 **** static bool RecoveryConflictPending = false; --- 184,195 ---- static bool RecoveryConflictRetryable = true; static ProcSignalReason RecoveryConflictReason; + /* + * True if backend is being killed by pg_terminate_backend(). + * Set by HandleTerminateBackendInterrupt() upon received SIGUSR1. + */ + static bool TerminateBackendRequest = false; + /* ---------------------------------------------------------------- * decls for routines only used in this file * ---------------------------------------------------------------- *************** *** 2875,2880 **** RecoveryConflictInterrupt(ProcSignalReason reason) --- 2881,2924 ---- } /* + * HandleTerminateBackendInterrupt: out-of-line portion of terminate backend + * handling following receipt of SIGUSR1. Designed to be similar to die(). + * Called only by a normal user backend. + */ + void + HandleTerminateBackendInterrupt(void) + { + int save_errno = errno; + + /* Don't joggle the elbow of proc_exit */ + if (!proc_exit_inprogress) + { + InterruptPending = true; + ProcDiePending = true; + TerminateBackendRequest = true; + + /* + * If it's safe to interrupt, and we're waiting for input or a lock, + * service the interrupt immediately + */ + if (ImmediateInterruptOK && InterruptHoldoffCount == 0 && + CritSectionCount == 0) + { + /* bump holdoff count to make ProcessInterrupts() a no-op */ + /* until we are done getting ready for it */ + InterruptHoldoffCount++; + LockWaitCancel(); /* prevent CheckDeadLock from running */ + DisableNotifyInterrupt(); + DisableCatchupInterrupt(); + InterruptHoldoffCount--; + ProcessInterrupts(); + } + } + + errno = save_errno; + } + + /* * ProcessInterrupts: out-of-line portion of CHECK_FOR_INTERRUPTS() macro * * If an interrupt condition is pending,and it's safe to service it, *************** *** 2912,2917 **** ProcessInterrupts(void) --- 2956,2966 ---- (errcode(ERRCODE_ADMIN_SHUTDOWN), errmsg("terminating connection dueto conflict with recovery"), errdetail_recovery_conflict())); + else if (TerminateBackendRequest) + ereport(FATAL, + (errcode(ERRCODE_TERMINATE_BACKEND), + errmsg("terminating connection due to pg_terminate_backend"))); + else ereport(FATAL, (errcode(ERRCODE_ADMIN_SHUTDOWN), *** a/src/backend/utils/adt/misc.c --- b/src/backend/utils/adt/misc.c *************** *** 114,120 **** pg_cancel_backend(PG_FUNCTION_ARGS) Datum pg_terminate_backend(PG_FUNCTION_ARGS) { ! PG_RETURN_BOOL(pg_signal_backend(PG_GETARG_INT32(0), SIGTERM)); } Datum --- 114,122 ---- Datum pg_terminate_backend(PG_FUNCTION_ARGS) { ! PG_RETURN_BOOL( ! SendProcSignal(PG_GETARG_INT32(0), PROCSIG_TERMNINATE_BACKEND_INTERRUPT, ! InvalidBackendId) == 0); } Datum *** a/src/include/storage/procsignal.h --- b/src/include/storage/procsignal.h *************** *** 40,45 **** typedef enum --- 40,47 ---- PROCSIG_RECOVERY_CONFLICT_BUFFERPIN, PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK, + PROCSIG_TERMNINATE_BACKEND_INTERRUPT, /* terminate request from pg_terminate_backend() */ + NUM_PROCSIGNALS /* Must be last! */ } ProcSignalReason; *************** *** 55,58 **** extern int SendProcSignal(pid_t pid, ProcSignalReason reason, --- 57,62 ---- extern void procsignal_sigusr1_handler(SIGNAL_ARGS); + extern void HandleTerminateBackendInterrupt(void); + #endif /* PROCSIGNAL_H */ *** a/src/include/utils/errcodes.h --- b/src/include/utils/errcodes.h *************** *** 332,337 **** --- 332,338 ---- #define ERRCODE_ADMIN_SHUTDOWN MAKE_SQLSTATE('5','7', 'P','0','1') #define ERRCODE_CRASH_SHUTDOWN MAKE_SQLSTATE('5','7', 'P','0','2') #define ERRCODE_CANNOT_CONNECT_NOW MAKE_SQLSTATE('5','7','P','0','3') + #define ERRCODE_TERMINATE_BACKEND MAKE_SQLSTATE('5','7', 'P','0','4') /* Class 58 - System Error (class borrowedfrom DB2) */ /* (we define this as errors external to PostgreSQL itself) */
>> >> Seems reasonable. Does the victim backend currently know why it has been >> >> killed? >> > >> > I don't think so. >> > >> > One idea is postmaster sets a flag in the shared memory area >> > indicating it rceived SIGTERM before forwarding the signal to >> > backends. >> > >> > Backend check the flag and if it's not set, it knows that the signal >> > has been sent by pg_terminate_backend(), not postmaster. >> >> Or it could also be sent by some other user process, like the user >> running "kill" from the shell. > > No problem (at least for pgpool-II). > > If the flag is not set, postgres returns the same code as the one > killed by pg_terminate_backend(). The point is, backend is killed by > postmaster or not. Because if backend was killed by postmaster, > pgpool-II should not expect the PostgreSQL server is usable since > postmaster decided to shutdown. Here is the patch to implement the feature. 1) pg_terminate_backend() sends SIGUSR1 signal rather than SIGTERM to the target backend. 2) The infrastructure used for message passing is storage/ipc/procsignal.c The new message type for ProcSignalReason is"PROCSIG_TERMNINATE_BACKEND_INTERRUPT" 3) I assign new error code 57P04 which is returned from the backend killed by pg_terminate_backend(). #define ERRCODE_TERMINATE_BACKEND MAKE_SQLSTATE('5','7', 'P','0','4') Comments are welcome. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp *** a/src/backend/storage/ipc/procsignal.c --- b/src/backend/storage/ipc/procsignal.c *************** *** 279,284 **** procsignal_sigusr1_handler(SIGNAL_ARGS) --- 279,287 ---- if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN)) RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN); + if (CheckProcSignal(PROCSIG_TERMNINATE_BACKEND_INTERRUPT)) + HandleTerminateBackendInterrupt(); + latch_sigusr1_handler(); errno = save_errno; *** a/src/backend/tcop/postgres.c --- b/src/backend/tcop/postgres.c *************** *** 184,189 **** static bool RecoveryConflictPending = false; --- 184,195 ---- static bool RecoveryConflictRetryable = true; static ProcSignalReason RecoveryConflictReason; + /* + * True if backend is being killed by pg_terminate_backend(). + * Set by HandleTerminateBackendInterrupt() upon received SIGUSR1. + */ + static bool TerminateBackendRequest = false; + /* ---------------------------------------------------------------- * decls for routines only used in this file * ---------------------------------------------------------------- *************** *** 2875,2880 **** RecoveryConflictInterrupt(ProcSignalReason reason) --- 2881,2924 ---- } /* + * HandleTerminateBackendInterrupt: out-of-line portion of terminate backend + * handling following receipt of SIGUSR1. Designed to be similar to die(). + * Called only by a normal user backend. + */ + void + HandleTerminateBackendInterrupt(void) + { + int save_errno = errno; + + /* Don't joggle the elbow of proc_exit */ + if (!proc_exit_inprogress) + { + InterruptPending = true; + ProcDiePending = true; + TerminateBackendRequest = true; + + /* + * If it's safe to interrupt, and we're waiting for input or a lock, + * service the interrupt immediately + */ + if (ImmediateInterruptOK && InterruptHoldoffCount == 0 && + CritSectionCount == 0) + { + /* bump holdoff count to make ProcessInterrupts() a no-op */ + /* until we are done getting ready for it */ + InterruptHoldoffCount++; + LockWaitCancel(); /* prevent CheckDeadLock from running */ + DisableNotifyInterrupt(); + DisableCatchupInterrupt(); + InterruptHoldoffCount--; + ProcessInterrupts(); + } + } + + errno = save_errno; + } + + /* * ProcessInterrupts: out-of-line portion of CHECK_FOR_INTERRUPTS() macro * * If an interrupt condition is pending,and it's safe to service it, *************** *** 2912,2917 **** ProcessInterrupts(void) --- 2956,2966 ---- (errcode(ERRCODE_ADMIN_SHUTDOWN), errmsg("terminating connection dueto conflict with recovery"), errdetail_recovery_conflict())); + else if (TerminateBackendRequest) + ereport(FATAL, + (errcode(ERRCODE_TERMINATE_BACKEND), + errmsg("terminating connection due to pg_terminate_backend"))); + else ereport(FATAL, (errcode(ERRCODE_ADMIN_SHUTDOWN), *** a/src/backend/utils/adt/misc.c --- b/src/backend/utils/adt/misc.c *************** *** 114,120 **** pg_cancel_backend(PG_FUNCTION_ARGS) Datum pg_terminate_backend(PG_FUNCTION_ARGS) { ! PG_RETURN_BOOL(pg_signal_backend(PG_GETARG_INT32(0), SIGTERM)); } Datum --- 114,122 ---- Datum pg_terminate_backend(PG_FUNCTION_ARGS) { ! PG_RETURN_BOOL( ! SendProcSignal(PG_GETARG_INT32(0), PROCSIG_TERMNINATE_BACKEND_INTERRUPT, ! InvalidBackendId) == 0); } Datum *** a/src/include/storage/procsignal.h --- b/src/include/storage/procsignal.h *************** *** 40,45 **** typedef enum --- 40,47 ---- PROCSIG_RECOVERY_CONFLICT_BUFFERPIN, PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK, + PROCSIG_TERMNINATE_BACKEND_INTERRUPT, /* terminate request from pg_terminate_backend() */ + NUM_PROCSIGNALS /* Must be last! */ } ProcSignalReason; *************** *** 55,58 **** extern int SendProcSignal(pid_t pid, ProcSignalReason reason, --- 57,62 ---- extern void procsignal_sigusr1_handler(SIGNAL_ARGS); + extern void HandleTerminateBackendInterrupt(void); + #endif /* PROCSIGNAL_H */ *** a/src/include/utils/errcodes.h --- b/src/include/utils/errcodes.h *************** *** 332,337 **** --- 332,338 ---- #define ERRCODE_ADMIN_SHUTDOWN MAKE_SQLSTATE('5','7', 'P','0','1') #define ERRCODE_CRASH_SHUTDOWN MAKE_SQLSTATE('5','7', 'P','0','2') #define ERRCODE_CANNOT_CONNECT_NOW MAKE_SQLSTATE('5','7','P','0','3') + #define ERRCODE_TERMINATE_BACKEND MAKE_SQLSTATE('5','7', 'P','0','4') /* Class 58 - System Error (class borrowedfrom DB2) */ /* (we define this as errors external to PostgreSQL itself) */
Tatsuo Ishii <ishii@postgresql.org> writes: > Comments are welcome. This is a bad idea. It makes an already-poorly-tested code path significantly more fragile, in return for nothing of value. regards, tom lane
> Tatsuo Ishii <ishii@postgresql.org> writes: >> Comments are welcome. > > This is a bad idea. It makes an already-poorly-tested code path > significantly more fragile, in return for nothing of value. Are you saying that procsignal.c is the already-poorly-tested one? If so, why? As for "value", I have already explained why we need this in the upthread. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp
> Here is the patch to implement the feature. > > 1) pg_terminate_backend() sends SIGUSR1 signal rather than SIGTERM to > the target backend. > 2) The infrastructure used for message passing is > storage/ipc/procsignal.c The new message type for ProcSignalReason > is "PROCSIG_TERMNINATE_BACKEND_INTERRUPT" > 3) I assign new error code 57P04 which is returned from the backend > killed by pg_terminate_backend(). > > #define ERRCODE_TERMINATE_BACKEND MAKE_SQLSTATE('5','7', 'P','0','4') > > Comments are welcome. Anyone has better idea? Tom dislikes my patch but I don't know how to deal with it. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp
On Fri, Jan 21, 2011 at 13:56, Tatsuo Ishii <ishii@postgresql.org> wrote: >> Here is the patch to implement the feature. >> >> 1) pg_terminate_backend() sends SIGUSR1 signal rather than SIGTERM to >> the target backend. >> 2) The infrastructure used for message passing is >> storage/ipc/procsignal.c The new message type for ProcSignalReason >> is "PROCSIG_TERMNINATE_BACKEND_INTERRUPT" >> 3) I assign new error code 57P04 which is returned from the backend >> killed by pg_terminate_backend(). >> >> #define ERRCODE_TERMINATE_BACKEND MAKE_SQLSTATE('5','7', 'P','0','4') > > Anyone has better idea? Tom dislikes my patch but I don't know how to > deal with it. There was another design in the past discussion: >> One idea is postmaster sets a flag in the shared memory area >> indicating it rceived SIGTERM before forwarding the signal to >> backends. Is it enough for your purpose and do we think it is more robust way? -- Itagaki Takahiro
Itagaki Takahiro <itagaki.takahiro@gmail.com> writes: > On Fri, Jan 21, 2011 at 13:56, Tatsuo Ishii <ishii@postgresql.org> wrote: >> Anyone has better idea? Tom dislikes my patch but I don't know how to >> deal with it. > There was another design in the past discussion: > One idea is postmaster sets a flag in the shared memory area > indicating it rceived SIGTERM before forwarding the signal to > backends. > Is it enough for your purpose and do we think it is more robust way? To put this as briefly as possible: I don't want to add even one line of code to distinguish pg_terminate_backend from database-wide shutdown. That function should be a last-ditch tool, not something used on a daily basis. So I disagree with the premise as much as with any particular implementation. regards, tom lane
On Fri, Jan 21, 2011 at 10:35 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Itagaki Takahiro <itagaki.takahiro@gmail.com> writes: >> On Fri, Jan 21, 2011 at 13:56, Tatsuo Ishii <ishii@postgresql.org> wrote: >>> Anyone has better idea? Tom dislikes my patch but I don't know how to >>> deal with it. > >> There was another design in the past discussion: >> One idea is postmaster sets a flag in the shared memory area >> indicating it rceived SIGTERM before forwarding the signal to >> backends. > >> Is it enough for your purpose and do we think it is more robust way? > > To put this as briefly as possible: I don't want to add even one line of > code to distinguish pg_terminate_backend from database-wide shutdown. > That function should be a last-ditch tool, not something used on a daily > basis. So I disagree with the premise as much as with any particular > implementation. Well, that seems awfully unfriendly. Frequency of use is beside the point - people are trying to write client applications - like pgpool-II - that understand the behavior of PG. If we send the same error code in two different situations with different behaviors, such applications have to do so silly workarounds to figure out what really happened. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company