Re: Weird failure with latches in curculio on v15 - Mailing list pgsql-hackers

From Nathan Bossart
Subject Re: Weird failure with latches in curculio on v15
Date
Msg-id 20230203185417.GA243728@nathanxps13
Whole thread Raw
In response to Re: Weird failure with latches in curculio on v15  (Andres Freund <andres@anarazel.de>)
Responses Re: Weird failure with latches in curculio on v15
Re: Weird failure with latches in curculio on v15
List pgsql-hackers
On Thu, Feb 02, 2023 at 11:44:22PM -0800, Andres Freund wrote:
> On 2023-02-03 02:24:03 -0500, Tom Lane wrote:
>> Andres Freund <andres@anarazel.de> writes:
>> > A workaround for the back branches could be to have a test in
>> > StartupProcShutdownHandler() that tests if MyProcPid == getpid(), and
>> > not do the proc_exit() if they don't match. We probably should just do
>> > an _exit() in that case.
>> 
>> Might work.
> 
> I wonder if we should add code complaining loudly about such a mismatch
> to proc_exit(), in addition to handling it more silently in
> StartupProcShutdownHandler().   Also, an assertion in
> [Auxiliary]ProcKill that proc->xid == MyProcPid == getpid() seems like a
> good idea.

From the discussion, it sounds like we don't want to depend on the child
process receiving/handling the signal, so we can't get rid of the
break-out-of-system() behavior (at least not in back-branches).  I've put
together some work-in-progress patches for the stopgap/back-branch fix.

0001 is just v1-0001 from upthread.  This moves Pre/PostRestoreCommand to
surround only the call to system().  I think this should get us closer to
pre-v15 behavior.

0002 adds the getpid() check mentioned above to
StartupProcShutdownHandler(), and it adds assertions to proc_exit() and
[Auxiliary]ProcKill().

0003 adds checks for shutdown requests before and after the call to
shell_restore().  IMO the Pre/PostRestoreCommand stuff is an implementation
detail for restore_command, so I think it behooves us to have some
additional shutdown checks that apply even for recovery modules.  This
patch could probably be moved to the recovery modules thread.

Is this somewhat close to what folks had in mind?

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

pgsql-hackers by date:

Previous
From: Nitin Jadhav
Date:
Subject: Re: Fix GUC_NO_SHOW_ALL test scenario in 003_check_guc.pl
Next
From: Corey Huinker
Date:
Subject: Re: proposal: psql: psql variable BACKEND_PID