stopgap fix for signal handling during restore_command - Mailing list pgsql-hackers

From Nathan Bossart
Subject stopgap fix for signal handling during restore_command
Date
Msg-id 20230223231503.GA743455@nathanxps13
Whole thread Raw
Responses Re: stopgap fix for signal handling during restore_command
List pgsql-hackers
On Wed, Feb 22, 2023 at 09:48:10PM +1300, Thomas Munro wrote:
> On Tue, Feb 21, 2023 at 5:50 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
>> I'm happy to create a new thread if needed, but I can't tell if there is
>> any interest in this stopgap/back-branch fix.  Perhaps we should just jump
>> straight to the long-term fix that Thomas is looking into.
> 
> Unfortunately the latch-friendly subprocess module proposal I was
> talking about would be for 17.  I may post a thread fairly soon with
> design ideas + list of problems and decision points as I see them, and
> hopefully some sketch code, but it won't be a proposal for [/me checks
> calendar] next week's commitfest and probably wouldn't be appropriate
> in a final commitfest anyway, and I also have some other existing
> stuff to clear first.  So please do continue with the stopgap ideas.

Okay, here is a new thread...

Since v8.4, the startup process will proc_exit() immediately within its
SIGTERM handler while the restore_command executes via system().   Some
recent changes added unsafe code to the section where this behavior is
enabled [0].  The long-term fix likely includes moving away from system()
completely, but we may want to have a stopgap/back-branch fix while that is
under development.

I've attached a patch set for a proposed stopgap fix.  0001 simply moves
the extra code outside of the Pre/PostRestoreCommand() block so that only
system() is executed while the SIGTERM handler might proc_exit().  This
restores the behavior that was in place from v8.4 to v14, so I don't expect
it to be too controversial.  0002 adds code to startup's SIGTERM handler to
call _exit() instead of proc_exit() if we are in a forked process from
system(), etc.  It also adds assertions to ensure proc_exit(), ProcKill(),
and AuxiliaryProcKill() are not called within such forked processes.

Thoughts?

[0] https://postgr.es/m/20230201105514.rsjl4bnhb65giyvo%40alap3.anarazel.de

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

pgsql-hackers by date:

Previous
From: Andrey Chudnovsky
Date:
Subject: Re: [PoC] Federated Authn/z with OAUTHBEARER
Next
From: Nathan Bossart
Date:
Subject: Re: Weird failure with latches in curculio on v15