On Wed, Feb 22, 2023 at 09:48:10PM +1300, Thomas Munro wrote:
> On Tue, Feb 21, 2023 at 5:50 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
>> I'm happy to create a new thread if needed, but I can't tell if there is
>> any interest in this stopgap/back-branch fix. Perhaps we should just jump
>> straight to the long-term fix that Thomas is looking into.
>
> Unfortunately the latch-friendly subprocess module proposal I was
> talking about would be for 17. I may post a thread fairly soon with
> design ideas + list of problems and decision points as I see them, and
> hopefully some sketch code, but it won't be a proposal for [/me checks
> calendar] next week's commitfest and probably wouldn't be appropriate
> in a final commitfest anyway, and I also have some other existing
> stuff to clear first. So please do continue with the stopgap ideas.
Okay, here is a new thread...
Since v8.4, the startup process will proc_exit() immediately within its
SIGTERM handler while the restore_command executes via system(). Some
recent changes added unsafe code to the section where this behavior is
enabled [0]. The long-term fix likely includes moving away from system()
completely, but we may want to have a stopgap/back-branch fix while that is
under development.
I've attached a patch set for a proposed stopgap fix. 0001 simply moves
the extra code outside of the Pre/PostRestoreCommand() block so that only
system() is executed while the SIGTERM handler might proc_exit(). This
restores the behavior that was in place from v8.4 to v14, so I don't expect
it to be too controversial. 0002 adds code to startup's SIGTERM handler to
call _exit() instead of proc_exit() if we are in a forked process from
system(), etc. It also adds assertions to ensure proc_exit(), ProcKill(),
and AuxiliaryProcKill() are not called within such forked processes.
Thoughts?
[0] https://postgr.es/m/20230201105514.rsjl4bnhb65giyvo%40alap3.anarazel.de
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com