Thread: Re: [HACKERS] Function to kill backend
I have applied the attached patch: Exit backend from SIGTERM or FATAL by simulating client EOF, rather than calling proc_exit() directly. This should make SIGTERM more reliable. --------------------------------------------------------------------------- Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > On first glance, I don't see anything dangerous about SIGTERM. > > You haven't thought about it very hard :-( > > The major difference I see is that elog(FATAL) will call proc_exit > directly from elog, rather than longjmp'ing back to PostgresMain. > The case that we have confidence in involves elog(ERROR) returning to > PostgresMain and then calling proc_exit from there (in the path where > we get EOF from the client). > > This leaves me with a couple of concerns: > > * Notice all that cleanup/reset stuff in the "if (sigsetjmp())" block > in PostgresMain. SIGTERM will cause proc_exit to be entered without > any of that being done first. Does it work reliably? Shouldn't this be > refactored to ensure the same things happen in both cases? > > * There are various places, especially in the PLs, that try to hook into > error recovery by manipulating Warn_restart. Will any of them have > problems if their error recovery code doesn't get called during SIGTERM > exit? > > One possible refactoring is for elog(FATAL) to go ahead and longjmp back > to PostgresMain, and at the end of the error recovery block check a flag > and do proc_exit() if we're fataling. However I am not sure that this > doesn't break the design for coping with elog's during proc_exit. > > Alvaro's nested-transaction work is another thing that's got to be > thought about before touching this code. I have not yet seen any design > for error recovery in the nested xact case, but I am sure it's going to > need some changes right around here. > > regards, tom lane > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073 Index: src/backend/tcop/postgres.c =================================================================== RCS file: /cvsroot/pgsql-server/src/backend/tcop/postgres.c,v retrieving revision 1.398 diff -c -c -r1.398 postgres.c *** src/backend/tcop/postgres.c 7 Apr 2004 05:05:49 -0000 1.398 --- src/backend/tcop/postgres.c 9 Apr 2004 17:42:46 -0000 *************** *** 2938,2944 **** /* * (3) read a command (loop blocks here) */ ! firstchar = ReadCommand(&input_message); /* * (4) disable async signal conditions again. --- 2938,2947 ---- /* * (3) read a command (loop blocks here) */ ! if (!in_fatal_exit) ! firstchar = ReadCommand(&input_message); ! else ! firstchar = EOF; /* * (4) disable async signal conditions again. *************** *** 3170,3176 **** * Otherwise it will fail to be called during other * backend-shutdown scenarios. */ ! proc_exit(0); case 'd': /* copy data */ case 'c': /* copy done */ --- 3173,3180 ---- * Otherwise it will fail to be called during other * backend-shutdown scenarios. */ ! proc_exit(!in_fatal_exit ? 0 : proc_exit_inprogress || ! !IsUnderPostmaster); case 'd': /* copy data */ case 'c': /* copy done */ Index: src/backend/utils/error/elog.c =================================================================== RCS file: /cvsroot/pgsql-server/src/backend/utils/error/elog.c,v retrieving revision 1.132 diff -c -c -r1.132 elog.c *** src/backend/utils/error/elog.c 5 Apr 2004 03:02:06 -0000 1.132 --- src/backend/utils/error/elog.c 9 Apr 2004 17:42:47 -0000 *************** *** 72,77 **** --- 72,79 ---- char *Log_line_prefix = NULL; /* format for extra log line info */ unsigned int Log_destination; + bool in_fatal_exit = false; + #ifdef HAVE_SYSLOG char *Syslog_facility; /* openlog() parameters */ char *Syslog_ident; *************** *** 442,448 **** */ fflush(stdout); fflush(stderr); ! proc_exit(proc_exit_inprogress || !IsUnderPostmaster); } /* --- 444,455 ---- */ fflush(stdout); fflush(stderr); ! ! if (in_fatal_exit) ! ereport(PANIC, (errmsg("fatal error during fatal exit, giving up"))); ! ! /* We will exit the backend by simulating a client EOF */ ! in_fatal_exit = true; } /* Index: src/include/tcop/tcopprot.h =================================================================== RCS file: /cvsroot/pgsql-server/src/include/tcop/tcopprot.h,v retrieving revision 1.64 diff -c -c -r1.64 tcopprot.h *** src/include/tcop/tcopprot.h 7 Apr 2004 05:05:50 -0000 1.64 --- src/include/tcop/tcopprot.h 9 Apr 2004 17:42:47 -0000 *************** *** 34,39 **** --- 34,40 ---- extern DLLIMPORT const char *debug_query_string; extern char *rendezvous_name; extern int max_stack_depth; + extern bool in_fatal_exit; /* GUC-configurable parameters */
Bruce Momjian <pgman@candle.pha.pa.us> writes: > I have applied the attached patch: > Exit backend from SIGTERM or FATAL by simulating client EOF, rather than > calling proc_exit() directly. This should make SIGTERM more reliable. After further consideration I have concluded that this was a spectacularly bad idea and we should revert that patch. There is a very large amount of processing that this patch will cause to happen after a FATAL error has been declared, and I doubt that any of it is a good idea. Some examples: * AbortCurrentTransaction() --- not too cool if the FATAL error was one of the ones in xact.c that are complaining of fatally bollixed transaction state. * pgstat reporting --- aside from the chance of an outright crash, we might be transmitting bogus statistics to the collector. * sending a ReadyForQuery (Z) message --- one thing we quite certainly ain't is ReadyForQuery. * EnableNotifyInterrupt --- this may result in actually trying to run a transaction to look through pg_listener :-( * ProcessConfigFile, if we had a pending SIGHUP --- also not too cool, if the FATAL was from guc.c. I am still dubious that zapping random backends with SIGTERM is a sane or supportable idea. But this patch does not make things better, it simply greatly increases the chance of a FATAL exit turning into a backend crash or PANIC. regards, tom lane
OK, I see your point. Can anyone remember why this was needed? I remember Magnus wanted query cancel, but what was the logic for session termination? --------------------------------------------------------------------------- Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > I have applied the attached patch: > > Exit backend from SIGTERM or FATAL by simulating client EOF, rather than > > calling proc_exit() directly. This should make SIGTERM more reliable. > > After further consideration I have concluded that this was a > spectacularly bad idea and we should revert that patch. There is a very > large amount of processing that this patch will cause to happen after a > FATAL error has been declared, and I doubt that any of it is a good > idea. Some examples: > > * AbortCurrentTransaction() --- not too cool if the FATAL error was one > of the ones in xact.c that are complaining of fatally bollixed > transaction state. > > * pgstat reporting --- aside from the chance of an outright crash, we > might be transmitting bogus statistics to the collector. > > * sending a ReadyForQuery (Z) message --- one thing we quite certainly > ain't is ReadyForQuery. > > * EnableNotifyInterrupt --- this may result in actually trying to run > a transaction to look through pg_listener :-( > > * ProcessConfigFile, if we had a pending SIGHUP --- also not too cool, > if the FATAL was from guc.c. > > > I am still dubious that zapping random backends with SIGTERM is a sane > or supportable idea. But this patch does not make things better, it > simply greatly increases the chance of a FATAL exit turning into a > backend crash or PANIC. > > regards, tom lane > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Bruce Momjian <pgman@candle.pha.pa.us> writes: > OK, I see your point. Can anyone remember why this was needed? I > remember Magnus wanted query cancel, but what was the logic for session > termination? I think the argument for it went along the lines of "what if you've got a misbehaving client that continually issues expensive queries, so canceling any one query doesn't help?" I found this unconvincing, as a client that obstreperous might simply reconnect and start issuing the same queries again; so having a session-kill tool doesn't really get you much further. I recall being voted down though ... regards, tom lane
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > OK, I see your point. Can anyone remember why this was needed? I > > remember Magnus wanted query cancel, but what was the logic for session > > termination? > > I think the argument for it went along the lines of "what if you've got > a misbehaving client that continually issues expensive queries, so > canceling any one query doesn't help?" I found this unconvincing, as > a client that obstreperous might simply reconnect and start issuing the > same queries again; so having a session-kill tool doesn't really get you > much further. I recall being voted down though ... Well, seems you have enough reason to suspect this code, and being so near to beta I think we can remove it and come back to it later if people still want it. I remember the "cancel" being the key feature, and termination was more of a "nice have" item. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073