Thread: RE: SIGTERM -> elog(FATAL) -> proc_exit() is probably a bad idea

RE: SIGTERM -> elog(FATAL) -> proc_exit() is probably a bad idea

From
"Mikheev, Vadim"
Date:
> I think we'd be lots better off to abandon the notion that we can exit
> directly from the SIGTERM interrupt handler, and instead treat SIGTERM
> the same way we treat QueryCancel: set a flag that is inspected at
> specific places where we know we are in a good state.
> 
> Comments?

This will be much cleaner.

Vadim


RE: SIGTERM -> elog(FATAL) -> proc_exit() is probably a bad idea

From
"Hiroshi Inoue"
Date:
> -----Original Message-----
> From: Mikheev, Vadim
> 
> > I think we'd be lots better off to abandon the notion that we can exit
> > directly from the SIGTERM interrupt handler, and instead treat SIGTERM
> > the same way we treat QueryCancel: set a flag that is inspected at
> > specific places where we know we are in a good state.
> > 
> > Comments?
> 
> This will be much cleaner.
> 

Hmm, CancelQuery isn't so urgent an operation currently.
For example, VACUUM checks QueryCancel flag only
once per table.

Regards.
Hiroshi Inoue 


Re: SIGTERM -> elog(FATAL) -> proc_exit() is probably a bad idea

From
Tom Lane
Date:
>> I think we'd be lots better off to abandon the notion that we can exit
>> directly from the SIGTERM interrupt handler, and instead treat SIGTERM
>> the same way we treat QueryCancel: set a flag that is inspected at
>> specific places where we know we are in a good state.
>> 
>> Comments?

> This will be much cleaner.

OK, here's a more detailed proposal.

We'll eliminate elog() directly from the signal handlers, except in the
extremely limited case where the main line is waiting for a lock (the
existing "cancel wait for lock" mechanism for QueryCancel can be used
for SIGTERM as well, I think).  Otherwise, both die() and
QueryCancelHandler() will just set flags and return.  handle_warn()
(the SIGQUIT handler) should probably go away entirely; it does nothing
that's not done better by QueryCancel.  I'm inclined to make SIGQUIT
invoke die() the same as SIGTERM does, unless someone has a better idea
what to do with that signal.

I believe we should keep the "critical section count" mechanism that
Vadim already created, even though it is no longer needed to discourage
the signal handlers themselves from aborting processing.  With the
critical section counter, it is OK to have flag checks inside subroutines
that are safe to abort from in some contexts and not others --- the
contexts that don't want an abort just have to do START/END_CRIT_SECTION
to ensure that QueryCancel/ProcDie won't happen in routines they call.
So the basic flag checks are like "if (InterruptFlagSet &&
CritSectionCounter == 0) elog(...);".

Having done that, the $64 question is where to test the flags.

Obviously we can check for interrupts at all the places where
QueryCancel is tested now, which is primarily the outer loops.
I suggest we also check for interrupts in s_lock's wait loop (ie, we can
cancel/die if we haven't yet got the lock AND we are not in a crit
section), as well as in END_CRIT_SECTION.

I intend to devise a macro CHECK_FOR_INTERRUPTS() that embodies
all the test code, rather than duplicating these if-tests in
many places.

Note I am assuming that it's always reasonable to check for QueryCancel
and ProcDie at the same places.  I do not see any hole in that theory,
but if someone finds one, we could introduce additional flags/counters
to distinguish safe-to-cancel from safe-to-die states.

Comments?
        regards, tom lane


Re: SIGTERM -> elog(FATAL) -> proc_exit() is probably a bad idea

From
Tom Lane
Date:
"Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
> Hmm, CancelQuery isn't so urgent an operation currently.
> For example, VACUUM checks QueryCancel flag only
> once per table.

Right, we'll need to check in more places.  See my just-posted proposal.
Checking at any spinlock grab should ensure that we check reasonably
often.

I just realized I forgot to mention the case of SIGTERM while the main
line is waiting for input from the frontend --- obviously we want to
quit immediately in that case, too.
        regards, tom lane