Re: Is RecoveryConflictInterrupt() entirely safe in a signal handler? - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Is RecoveryConflictInterrupt() entirely safe in a signal handler?
Date
Msg-id 20220409213916.4ydygnkf37dd4sw2@alap3.anarazel.de
Whole thread Raw
In response to Re: Is RecoveryConflictInterrupt() entirely safe in a signal handler?  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Is RecoveryConflictInterrupt() entirely safe in a signal handler?  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
Hi,

On 2022-04-09 17:00:41 -0400, Tom Lane wrote:
> Thomas Munro <thomas.munro@gmail.com> writes:
> > Unlike most "procsignal" handler routines, RecoveryConflictInterrupt()
> > doesn't just set a sig_atomic_t flag and poke the latch.  Is the extra
> > stuff it does safe?  For example, is this call stack OK (to pick one
> > that jumps out, but not the only one)?
> 
> > procsignal_sigusr1_handler
> > -> RecoveryConflictInterrupt
> >  -> HoldingBufferPinThatDelaysRecovery
> >   -> GetPrivateRefCount
> >    -> GetPrivateRefCountEntry
> >     -> hash_search(...hash table that might be in the middle of an update...)
> 
> Ugh.  That one was safe before somebody decided we needed a hash table
> for buffer refcounts, but it's surely not safe now.

Mea culpa. This is 4b4b680c3d6d - from 2014.


> Which, of course, demonstrates the folly of allowing signal handlers to call
> much of anything; but especially doing so without clearly marking the called
> functions as needing to be signal safe.

Yea. Particularly when just going through bufmgr and updating places that look
at pin counts, it's not immediately obvious that
HoldingBufferPinThatDelaysRecovery() runs in a signal handler. Partially
because RecoveryConflictInterrupt() - which is mentioned in the comment above
HoldingBufferPinThatDelaysRecovery() - sounds a lot like it's called from
ProcessInterrupts(), which doesn't run in a signal handler...

RecoveryConflictInterrupt() calls a lot of functions, some of which quite
plausibly could be changed to not be signal safe, even if they currently are.


Is there really a reason for RecoveryConflictInterrupt() to run in a signal
handler? Given that we only react to conflicts in ProcessInterrupts(), it's
not immediately obvious that we need to do anything in
RecoveryConflictInterrupt() but set some flags.  There's probably some minor
efficiency gains, but that seems unconvincing.


The comments really need a rewrite - it sounds like
RecoveryConflictInterrupt() will error out itself:

                /*
                 * If we can abort just the current subtransaction then we are
                 * OK to throw an ERROR to resolve the conflict. Otherwise
                 * drop through to the FATAL case.
                 *
                 * XXX other times that we can throw just an ERROR *may* be
                 * PROCSIG_RECOVERY_CONFLICT_LOCK if no locks are held in
                 * parent transactions
                 *
                 * PROCSIG_RECOVERY_CONFLICT_SNAPSHOT if no snapshots are held
                 * by parent transactions and the transaction is not
                 * transaction-snapshot mode
                 *
                 * PROCSIG_RECOVERY_CONFLICT_TABLESPACE if no temp files or
                 * cursors open in parent transactions
                 */

it's technically not *wrong* because it's setting up state that then leads to
ERROR / FATAL being thrown, but ...


Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Is RecoveryConflictInterrupt() entirely safe in a signal handler?
Next
From: Andres Freund
Date:
Subject: Re: failures in t/031_recovery_conflict.pl on CI