Re: Weird failure with latches in curculio on v15 - Mailing list pgsql-hackers

From Nathan Bossart
Subject Re: Weird failure with latches in curculio on v15
Date
Msg-id 20230203053548.GA27055@nathanxps13
Whole thread Raw
In response to Re: Weird failure with latches in curculio on v15  (Nathan Bossart <nathandbossart@gmail.com>)
List pgsql-hackers
On Thu, Feb 02, 2023 at 02:39:19PM -0800, Nathan Bossart wrote:
> Maybe we could just
> remove this exit-in-SIGTERM-handler business...

I've spent some time testing this.  It seems to work pretty well, but only
if I keep the exit-on-SIGTERM logic in shell_restore().  Without that, I'm
seeing delayed shutdowns, which I assume means
HandleStartupProcInterrupts() isn't getting called (I'm still investigating
this).  Іn any case, the fact that shell_restore() exits if the command
fails due to SIGTERM seems like an implementation detail that we won't
necessarily want to rely on once recovery modules are available.  In short,
we seem to depend on the SIGTERM handling in RestoreArchivedFile() in order
to be responsive to shutdown requests.

One idea I have is to approximate the current behavior by simply checking
for the shutdown_requested flag before before and after executing
restore_command.  This seems to work as desired even if the exit-on-SIGTERM
logic is removed from shell_restore().  Unless there is some reason to
break out of system() (versus just waiting for the command to fail after it
receives SIGTERM), I think this approach should suffice.

I've attached a draft patch.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Time delayed LR (WAS Re: logical replication restrictions)
Next
From: Amit Kapila
Date:
Subject: Re: Time delayed LR (WAS Re: logical replication restrictions)