Thread: Use standard SIGHUP and SIGTERM handlers in autoprewarm module

Use standard SIGHUP and SIGTERM handlers in autoprewarm module

From

Bharath Rupireddy

Date:

05 October 2020, 10:45:49

Hi,

Autoprewarm module is using it's own SIGHUP(apw_sigterm_handler, got_sigterm) and SIGTERM(apw_sighup_handler, got_sighup) handlers which are similar to standard signal handlers(except for a difference [1]). Isn't it good to remove them and use standard SignalHandlerForConfigReload and SignalHandlerForShutdownRequest?

Attaching the patch for the above changes.

Looks like the commit[2] replaced custom handlers with standard handlers.

Thoughts?

[1] apw_sigterm_handler() and apw_sighup_handler() use MyProc->procLatch
if (MyProc)
SetLatch(&MyProc->procLatch);
where as standard handlers use MyLatch
SetLatch(MyLatch);
Both MyProc->procLatch and MyLatch point to same, see comment from global.c
/*
* MyLatch points to the latch that should be used for signal handling by the
* current process. It will either point to a process local latch if the
* current process does not have a PGPROC entry in that moment, or to
* PGPROC->procLatch if it has. Thus it can always be used in signal handlers,
* without checking for its existence.
*/

struct Latch *MyLatch;

(gdb) p MyProc->procLatch
$6 = {is_set = 0, is_shared = true, owner_pid = 1448807}
(gdb) p MyLatch
$7 = (struct Latch *) 0x7fcacc6d902c
(gdb) p &MyProc->procLatch
$8 = (Latch *) 0x7fcacc6d902c
(gdb) p *MyLatch
$9 = {is_set = 0, is_shared = true, owner_pid = 1448807}

[2] commit 1e53fe0e70f610c34f4c9e770d108cd94151342c
Author: Robert Haas <rhaas@postgresql.org>
Date: 2019-12-17 13:03:57 -0500

Use PostgresSigHupHandler in more places.

There seems to be no reason for every background process to have
its own flag indicating that a config-file reload is needed.
Instead, let's just use ConfigFilePending for that purpose
everywhere.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachment

v1-Use-Standard-SIGTERM-SIGHUP-Handlers-In-AutoPrewarm-Module.patch

Re: Use standard SIGHUP and SIGTERM handlers in autoprewarm module

From

Fujii Masao

Date:

05 October 2020, 14:34:05

On 2020/10/05 19:45, Bharath Rupireddy wrote:
> Hi,
> 
> Autoprewarm module is using it's own SIGHUP(apw_sigterm_handler, got_sigterm) and SIGTERM(apw_sighup_handler,
got_sighup)handlers which are similar to standard signal handlers(except for a difference [1]). Isn't it good to remove
themand  use standard SignalHandlerForConfigReload and SignalHandlerForShutdownRequest?

> 
> Attaching the patch for the above changes.
> 
> Looks like the commit[2] replaced custom handlers with standard handlers.
> 
> Thoughts?

+1

The patch looks good to me.

ISTM that we can also replace StartupProcSigHupHandler() in startup.c
with SignalHandlerForConfigReload() by making the startup process use
the general shared latch instead of its own one. POC patch attached.
Thought?

Probably we can also replace sigHupHandler() in syslogger.c with
SignalHandlerForConfigReload()? This would be separate patch, though.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Attachment

make_startup_use_general_shared_latch.patch

Re: Use standard SIGHUP and SIGTERM handlers in autoprewarm module

From

Bharath Rupireddy

Date:

05 October 2020, 16:18:19

On Mon, Oct 5, 2020 at 8:04 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>
> On 2020/10/05 19:45, Bharath Rupireddy wrote:
> > Hi,
> >
> > Autoprewarm module is using it's own SIGHUP(apw_sigterm_handler, got_sigterm) and SIGTERM(apw_sighup_handler,
got_sighup)handlers which are similar to standard signal handlers(except for a difference [1]). Isn't it good to remove
themand  use standard SignalHandlerForConfigReload and SignalHandlerForShutdownRequest? 
> >
> > Attaching the patch for the above changes.
> >
> > Looks like the commit[2] replaced custom handlers with standard handlers.
> >
> > Thoughts?
>
> +1
>
> The patch looks good to me.
>

Thanks.

>
> ISTM that we can also replace StartupProcSigHupHandler() in startup.c
> with SignalHandlerForConfigReload() by making the startup process use
> the general shared latch instead of its own one. POC patch attached.
> Thought?
>

I'm not quite sure whether it breaks something or not. I see that
WakeupRecovery() with XLogCtl->recoveryWakeupLatch latch from the
startup process is also being used in the walreceiver process. I may
be wrong, but have some concern if the behaviour is different in case
of EXEC_BACKEND and Windows?

Another concern is that we are always using
XLogCtl->recoveryWakeupLatch in shared mode, this makes sense as this
latch is also being used in walrecevier. But sometimes, MyLatch is
created in non-shared mode as well(see InitLatch(MyLatch)).

Others may have better thoughts though.

>
> Probably we can also replace sigHupHandler() in syslogger.c with
> SignalHandlerForConfigReload()? This would be separate patch, though.
>

+1 to replace sigHupHandler() with SignalHandlerForConfigReload() as
the latch and the functionality are pretty much the same.

WalReceiverMai(): I think we can also replace WalRcvShutdownHandler()
with SignalHandlerForShutdownRequest() because walrcv->latch point to
&MyProc->procLatch which in turn point to MyLatch.

Thoughts? If okay, we can combine these into a single patch. I will
post an updated patch soon.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Re: Use standard SIGHUP and SIGTERM handlers in autoprewarm module

From

Fujii Masao

Date:

06 October 2020, 05:50:50


On 2020/10/06 1:18, Bharath Rupireddy wrote:
> On Mon, Oct 5, 2020 at 8:04 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>>
>> On 2020/10/05 19:45, Bharath Rupireddy wrote:
>>> Hi,
>>>
>>> Autoprewarm module is using it's own SIGHUP(apw_sigterm_handler, got_sigterm) and SIGTERM(apw_sighup_handler,
got_sighup)handlers which are similar to standard signal handlers(except for a difference [1]). Isn't it good to remove
themand  use standard SignalHandlerForConfigReload and SignalHandlerForShutdownRequest?
 
>>>
>>> Attaching the patch for the above changes.
>>>
>>> Looks like the commit[2] replaced custom handlers with standard handlers.
>>>
>>> Thoughts?
>>
>> +1
>>
>> The patch looks good to me.
>>
> 
> Thanks.
> 
>>
>> ISTM that we can also replace StartupProcSigHupHandler() in startup.c
>> with SignalHandlerForConfigReload() by making the startup process use
>> the general shared latch instead of its own one. POC patch attached.
>> Thought?
>>
> 
> I'm not quite sure whether it breaks something or not. I see that
> WakeupRecovery() with XLogCtl->recoveryWakeupLatch latch from the
> startup process is also being used in the walreceiver process. I may
> be wrong, but have some concern if the behaviour is different in case
> of EXEC_BACKEND and Windows?

Unless I'm wrong, regarding MyLatch, the behavior is not different
whether in EXEC_BACKEND or not. In both cases, SwitchToSharedLatch()
is called and MyLatch is set to the shared latch, i.e., MyProc->procLatch.


> 
> Another concern is that we are always using
> XLogCtl->recoveryWakeupLatch in shared mode, this makes sense as this
> latch is also being used in walrecevier. But sometimes, MyLatch is
> created in non-shared mode as well(see InitLatch(MyLatch)).

Yes, and then the startup process calls SwitchToSharedLatch() and
repoint MyLatch to the shared one.

> 
> Others may have better thoughts though.

Okay, I will reconsider the patch and post it separately later if necessary.


> 
>>
>> Probably we can also replace sigHupHandler() in syslogger.c with
>> SignalHandlerForConfigReload()? This would be separate patch, though.
>>
> 
> +1 to replace sigHupHandler() with SignalHandlerForConfigReload() as
> the latch and the functionality are pretty much the same.
> 
> WalReceiverMai(): I think we can also replace WalRcvShutdownHandler()
> with SignalHandlerForShutdownRequest() because walrcv->latch point to
> &MyProc->procLatch which in turn point to MyLatch.
> 
> Thoughts? If okay, we can combine these into a single patch. I will
> post an updated patch soon.

+1 Or it's also ok to make each patch separately.
Anyway, thanks!

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: Use standard SIGHUP and SIGTERM handlers in autoprewarm module

From

Michael Paquier

Date:

06 October 2020, 05:53:55

On Mon, Oct 05, 2020 at 11:34:05PM +0900, Fujii Masao wrote:
> ISTM that we can also replace StartupProcSigHupHandler() in startup.c
> with SignalHandlerForConfigReload() by making the startup process use
> the general shared latch instead of its own one. POC patch attached.
> Thought?

That looks good to me.  Nice cleanup.

> Probably we can also replace sigHupHandler() in syslogger.c with
> SignalHandlerForConfigReload()? This would be separate patch, though.

+1.
--
Michael

Attachment

signature.asc

Re: Use standard SIGHUP and SIGTERM handlers in autoprewarm module

From

Bharath Rupireddy

Date:

06 October 2020, 06:11:18

On Tue, Oct 6, 2020 at 11:20 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>
> >>
> >> Probably we can also replace sigHupHandler() in syslogger.c with
> >> SignalHandlerForConfigReload()? This would be separate patch, though.
> >>
> >
> > +1 to replace sigHupHandler() with SignalHandlerForConfigReload() as
> > the latch and the functionality are pretty much the same.
> >
> > WalReceiverMai(): I think we can also replace WalRcvShutdownHandler()
> > with SignalHandlerForShutdownRequest() because walrcv->latch point to
> > &MyProc->procLatch which in turn point to MyLatch.
> >
> > Thoughts? If okay, we can combine these into a single patch. I will
> > post an updated patch soon.
>
> +1 Or it's also ok to make each patch separately.
> Anyway, thanks!
>

Thanks. +1 to have separate patches. I will post two separate patches
for autoprewarm and WalRcvShutdownHandler for further review. The
other two patches can be for startup.c and syslogger.c.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Re: Use standard SIGHUP and SIGTERM handlers in autoprewarm module

From

Bharath Rupireddy

Date:

07 October 2020, 02:30:32

On Tue, Oct 6, 2020 at 11:41 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Tue, Oct 6, 2020 at 11:20 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> >
> > +1 Or it's also ok to make each patch separately.
> > Anyway, thanks!
> >
>
> Thanks. +1 to have separate patches. I will post two separate patches
> for autoprewarm and WalRcvShutdownHandler for further review. The
> other two patches can be for startup.c and syslogger.c.
>

I'm attaching all the 4 patches here together.

1. v1-use-standard-SIGHUP-and-SIGTERM-handlers-in-autoprewarm-process.patch
--  This is the patch initially sent in this mail by me, I just
renamed it.
2. v1-use-MyLatch-and-standard-SIGHUP-handler-in-startup-process.patch
-- This is the patch proposed by Fuji Masao, I just renamed it.
3. v1-use-standard-SIGHUP-hanlder-in-syslogger-process.patch
4. v1-use-standard-SIGTERM-handler-in-walreceiver-process.patch

Please consider these patches for further review.


With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: Use standard SIGHUP and SIGTERM handlers in autoprewarm module

From

Bharath Rupireddy

Date:

07 October 2020, 12:38:58

On Wed, Oct 7, 2020 at 8:00 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Tue, Oct 6, 2020 at 11:41 AM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > On Tue, Oct 6, 2020 at 11:20 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> > >
> > > +1 Or it's also ok to make each patch separately.
> > > Anyway, thanks!
> > >
> >
> > Thanks. +1 to have separate patches. I will post two separate patches
> > for autoprewarm and WalRcvShutdownHandler for further review. The
> > other two patches can be for startup.c and syslogger.c.
> >
>
> I'm attaching all the 4 patches here together.
>
> 1. v1-use-standard-SIGHUP-and-SIGTERM-handlers-in-autoprewarm-process.patch
> --  This is the patch initially sent in this mail by me, I just
> renamed it.
> 2. v1-use-MyLatch-and-standard-SIGHUP-handler-in-startup-process.patch
> -- This is the patch proposed by Fuji Masao, I just renamed it.
> 3. v1-use-standard-SIGHUP-hanlder-in-syslogger-process.patch
> 4. v1-use-standard-SIGTERM-handler-in-walreceiver-process.patch
>
> Please consider these patches for further review.
>

Added this to the commitfest for further review.

https://commitfest.postgresql.org/30/2756/



With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Re: Use standard SIGHUP and SIGTERM handlers in autoprewarm module

From

Craig Ringer

Date:

23 October 2020, 04:06:50

On Wed, Oct 7, 2020 at 8:39 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote:

On Wed, Oct 7, 2020 at 8:00 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Tue, Oct 6, 2020 at 11:41 AM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > On Tue, Oct 6, 2020 at 11:20 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> > >
> > > +1 Or it's also ok to make each patch separately.
> > > Anyway, thanks!
> > >
> >
> > Thanks. +1 to have separate patches. I will post two separate patches
> > for autoprewarm and WalRcvShutdownHandler for further review. The
> > other two patches can be for startup.c and syslogger.c.

[ CC'd Robert Haas since he's the main name in interrupt.c, test_shm_mq/worker.c, ]

src/test/modules/test_shm_mq/worker.c appears to do the right thing the wrong way - it has its own custom handler instead of using die() or SignalHandlerForShutdownRequest().

In contrast src/test/modules/worker_spi/worker_spi.c looks plain wrong. Especially since it's quoted as an example of how to do things right. It won't respond to SIGTERM at all while it's executing a query from its queue, no matter how long that query takes or whether it blocks. It can inhibit even postmaster shutdown as a result.

I was going to lob off a quick patch to fix this by making both use quickdie() for SIGQUIT and die() for SIGTERM, but after reading for a bit I'm no longer sure what the right choice even is. I'd welcome some opinions.

The problem is that but interrupt.c and interrupt.h actually define and recommend different and simpler handlers for these jobs - ones that don't actually work properly for code that calls into Pg core and might rely on CHECK_FOR_INTERRUPTS(), InterruptsPending and ProcDiePending to properly respect SIGTERM.

And to add to the confusion the bgworker infra adds its own different default SIGTERM handler bgworker_die() that's weirdly in-between the interrupt.c and postgres.c signal handling.

So I'm no longer sure how the example code should even be fixed. I'm not convinced everyone using die() and quickdie() is good given they currently seem to be assumed to be mainly for user backends. Maybe wwe should move them to interrupt.c along with CHECK_FOR_INTERRUPTS(), ProcessInterrupts, etc and document them as for all database-connected or shmem-connected backends to use.

So in the medium term, interrupt.c's SignalHandlerForShutdownRequest() and SignalHandlerForCrashExit() should be combined with die() and quickdie(), integrating properly with CHECK_FOR_INTERRUPTS(), ProcessInterrupts(), etc. We can add a hook we call before we proc_exit() in response to ProcDiePending so backends can choose to mask it if there's a period during which they wish to defer responding to SIGTERM, but by default everything will respect SIGTERM -> die() sets ProcDiePending -> CHECK_FOR_INTERRUPTS() -> ProcessInterrupts() -> proc_exit() . interrupt.c's SignalHandlerForCrashExit() and SignalHandlerForShutdownRequest() become deprecated/legacy. We add a separate temporary handler that's installed by init.c for early SIGQUIT handling but document it as to be replaced after backends start properly. We'd delete the bgw-specific signal handlers and install die() and procdie() instead during StartBackgroundWorker - at least if the bgw is connecting to shmem or a database. interrupt.c's HandleMainLoopInterrupts() could be static inlined, and adopted in the bgworker examples and all the other places that currently do ConfigReloadPending / ProcessConfigFile() etc themselves.

It wouldn't be a clean sweep of consistent signal handling, given all the funky stuff we have in the checkpointer, walsender, etc. But I think it might help...

(And maybe I could even combine the various am_foo and is_bar globals we use to identify different sorts of backend, while doing such cleanups).

Re: Use standard SIGHUP and SIGTERM handlers in autoprewarm module

From

Fujii Masao

Date:

29 October 2020, 06:21:57


On 2020/10/07 11:30, Bharath Rupireddy wrote:
> On Tue, Oct 6, 2020 at 11:41 AM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
>>
>> On Tue, Oct 6, 2020 at 11:20 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>>>
>>> +1 Or it's also ok to make each patch separately.
>>> Anyway, thanks!
>>>
>>
>> Thanks. +1 to have separate patches. I will post two separate patches
>> for autoprewarm and WalRcvShutdownHandler for further review. The
>> other two patches can be for startup.c and syslogger.c.
>>
> 
> I'm attaching all the 4 patches here together.
> 
> 1. v1-use-standard-SIGHUP-and-SIGTERM-handlers-in-autoprewarm-process.patch
> --  This is the patch initially sent in this mail by me, I just
> renamed it.
> 2. v1-use-MyLatch-and-standard-SIGHUP-handler-in-startup-process.patch
> -- This is the patch proposed by Fuji Masao, I just renamed it.
> 3. v1-use-standard-SIGHUP-hanlder-in-syslogger-process.patch
> 4. v1-use-standard-SIGTERM-handler-in-walreceiver-process.patch
> 
> Please consider these patches for further review.

Thanks for the patches! They look good to me. So barring any objections,
I will commit them one by one.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: Use standard SIGHUP and SIGTERM handlers in autoprewarm module

From

Fujii Masao

Date:

04 November 2020, 09:06:22


On 2020/10/29 15:21, Fujii Masao wrote:
> 
> 
> On 2020/10/07 11:30, Bharath Rupireddy wrote:
>> On Tue, Oct 6, 2020 at 11:41 AM Bharath Rupireddy
>> <bharath.rupireddyforpostgres@gmail.com> wrote:
>>>
>>> On Tue, Oct 6, 2020 at 11:20 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>>>>
>>>> +1 Or it's also ok to make each patch separately.
>>>> Anyway, thanks!
>>>>
>>>
>>> Thanks. +1 to have separate patches. I will post two separate patches
>>> for autoprewarm and WalRcvShutdownHandler for further review. The
>>> other two patches can be for startup.c and syslogger.c.
>>>
>>
>> I'm attaching all the 4 patches here together.
>>
>> 1. v1-use-standard-SIGHUP-and-SIGTERM-handlers-in-autoprewarm-process.patch
>> --  This is the patch initially sent in this mail by me, I just
>> renamed it.
>> 2. v1-use-MyLatch-and-standard-SIGHUP-handler-in-startup-process.patch
>> -- This is the patch proposed by Fuji Masao, I just renamed it.
>> 3. v1-use-standard-SIGHUP-hanlder-in-syslogger-process.patch
>> 4. v1-use-standard-SIGTERM-handler-in-walreceiver-process.patch
>>
>> Please consider these patches for further review.
> 
> Thanks for the patches! They look good to me. So barring any objections,
> I will commit them one by one.

I pushed the following two patches.

- v1-use-standard-SIGHUP-hanlder-in-syslogger-process.patch
- v1-use-MyLatch-and-standard-SIGHUP-handler-in-startup-process.patch

Regarding other two patches, I think that it's better to use MyLatch
rather than MyProc->procLatch or walrcv->latch in WaitLatch() and
ResetLatch(), like other code does. Attached are the updated versions
of the patches. Thought?

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Attachment

Re: Use standard SIGHUP and SIGTERM handlers in autoprewarm module

From

Bharath Rupireddy

Date:

04 November 2020, 15:46:29

On Wed, Nov 4, 2020 at 2:36 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>
> Regarding other two patches, I think that it's better to use MyLatch
> rather than MyProc->procLatch or walrcv->latch in WaitLatch() and
> ResetLatch(), like other code does. Attached are the updated versions
> of the patches. Thought?
>

+1 for replacing MyProc->procLatch with MyLatch in the autoprewarm
module, and the patch looks good to me.

I'm not quite sure to replace all the places in the walreceiver
process, for instance in WalRcvForceReply() we are using spinlock to
acquire the latch pointer and . Others may have better thoughts on
this.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Re: Use standard SIGHUP and SIGTERM handlers in autoprewarm module

From

Kyotaro Horiguchi

Date:

05 November 2020, 03:12:06

At Wed, 4 Nov 2020 21:16:29 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in 
> On Wed, Nov 4, 2020 at 2:36 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> >
> > Regarding other two patches, I think that it's better to use MyLatch
> > rather than MyProc->procLatch or walrcv->latch in WaitLatch() and
> > ResetLatch(), like other code does. Attached are the updated versions
> > of the patches. Thought?
> >
> 
> +1 for replacing MyProc->procLatch with MyLatch in the autoprewarm
> module, and the patch looks good to me.

Looks good to me, too.

> I'm not quite sure to replace all the places in the walreceiver
> process, for instance in WalRcvForceReply() we are using spinlock to
> acquire the latch pointer and . Others may have better thoughts on
> this.

The SIGTERM part looks good. The only difference between
WalRcvSigHupHandler and SignalHandlerForConfigReload is whether latch
is set or not.  I don't think it's a problem that config-reload causes
an extra wakeup.  Couldn't we do the same thing for SIGHUP?

We might even be able to reload config file in
ProcessWalRcvInterrupts().

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Use standard SIGHUP and SIGTERM handlers in autoprewarm module

From

Fujii Masao

Date:

06 November 2020, 17:30:44


On 2020/11/05 12:12, Kyotaro Horiguchi wrote:
> At Wed, 4 Nov 2020 21:16:29 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in
>> On Wed, Nov 4, 2020 at 2:36 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>>>
>>> Regarding other two patches, I think that it's better to use MyLatch
>>> rather than MyProc->procLatch or walrcv->latch in WaitLatch() and
>>> ResetLatch(), like other code does. Attached are the updated versions
>>> of the patches. Thought?
>>>
>>
>> +1 for replacing MyProc->procLatch with MyLatch in the autoprewarm
>> module, and the patch looks good to me.
> 
> Looks good to me, too.

Thanks for the review! I pushed the patch.

> 
>> I'm not quite sure to replace all the places in the walreceiver
>> process, for instance in WalRcvForceReply() we are using spinlock to
>> acquire the latch pointer and . Others may have better thoughts on
>> this.
> 
> The SIGTERM part looks good. The only difference between
> WalRcvSigHupHandler and SignalHandlerForConfigReload is whether latch
> is set or not.  I don't think it's a problem that config-reload causes
> an extra wakeup.  Couldn't we do the same thing for SIGHUP?

I agree that we can use even standard SIGHUP handler in walreceiver.
I'm not sure why SetLatch() was not called in walreceiver's SIGHUP
handler so far.

Regards,


-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: Use standard SIGHUP and SIGTERM handlers in autoprewarm module

From

Bharath Rupireddy

Date:

07 November 2020, 14:01:21

On Fri, Nov 6, 2020 at 11:00 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>
> >> I'm not quite sure to replace all the places in the walreceiver
> >> process, for instance in WalRcvForceReply() we are using spinlock to
> >> acquire the latch pointer and . Others may have better thoughts on
> >> this.
> >
> > The SIGTERM part looks good. The only difference between
> > WalRcvSigHupHandler and SignalHandlerForConfigReload is whether latch
> > is set or not.  I don't think it's a problem that config-reload causes
> > an extra wakeup.  Couldn't we do the same thing for SIGHUP?
>
> I agree that we can use even standard SIGHUP handler in walreceiver.
> I'm not sure why SetLatch() was not called in walreceiver's SIGHUP
> handler so far.
>

The main reason for having SetLatch() in
SignalHandlerForConfigReload() is to wake up the calling process if
waiting in WaitLatchOrSocket() or WaitLatch() and reload the new
config file and use the reloaded config variables. Maybe we should
give a thought on the scenarios in which the walreceiver process
waits, and what happens in case the latch is set when SIGHUP is
received.

And also, I think it's worth having a look at the commit 40f908bdcdc7
that introduced WalRcvSigHupHandler() and 597a87ccc that replaced
custom latch with procLatch.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Re: Use standard SIGHUP and SIGTERM handlers in autoprewarm module

From

Kyotaro Horiguchi

Date:

10 November 2020, 07:17:40

At Sat, 7 Nov 2020 19:31:21 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in 
> The main reason for having SetLatch() in
> SignalHandlerForConfigReload() is to wake up the calling process if
> waiting in WaitLatchOrSocket() or WaitLatch() and reload the new
> config file and use the reloaded config variables. Maybe we should
> give a thought on the scenarios in which the walreceiver process
> waits, and what happens in case the latch is set when SIGHUP is
> received.

The difference is whether the config file is processed at the next
wakeup (by force-reply-request or SIGTERM) of walreceiver or
immediately. If server-reload happened frequently, say, several times
per second(?), we should consider to reduce the useless reloading, but
actually that's not the case.

> And also, I think it's worth having a look at the commit 40f908bdcdc7
> that introduced WalRcvSigHupHandler() and 597a87ccc that replaced
> custom latch with procLatch.

At the time of the first patch, PostgreSQL processes used arbitrarily
implemented SIGHUP handlers for their own so it is natural that
walreceiver used its own one.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Use standard SIGHUP and SIGTERM handlers in autoprewarm module

From

Fujii Masao

Date:

10 November 2020, 09:34:44


On 2020/11/10 16:17, Kyotaro Horiguchi wrote:
> At Sat, 7 Nov 2020 19:31:21 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in
>> The main reason for having SetLatch() in
>> SignalHandlerForConfigReload() is to wake up the calling process if
>> waiting in WaitLatchOrSocket() or WaitLatch() and reload the new
>> config file and use the reloaded config variables. Maybe we should
>> give a thought on the scenarios in which the walreceiver process
>> waits, and what happens in case the latch is set when SIGHUP is
>> received.
> 
> The difference is whether the config file is processed at the next
> wakeup (by force-reply-request or SIGTERM) of walreceiver or
> immediately. If server-reload happened frequently, say, several times
> per second(?), we should consider to reduce the useless reloading, but
> actually that's not the case.

So, attached is the patch that makes walreceiver use both standard
SIGTERM and SIGHUP handlers. Currently I've not found any actual
issues by making walreceiver use standard SIGHUP handler, yet.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Attachment

v3-use-standard-SIGTERM-and-SIGHUP-handlers-in-walreceiver-process.patch

Re: Use standard SIGHUP and SIGTERM handlers in autoprewarm module

From

Bharath Rupireddy

Date:

10 November 2020, 12:30:07

On Tue, Nov 10, 2020 at 3:04 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>
> >> The main reason for having SetLatch() in
> >> SignalHandlerForConfigReload() is to wake up the calling process if
> >> waiting in WaitLatchOrSocket() or WaitLatch() and reload the new
> >> config file and use the reloaded config variables. Maybe we should
> >> give a thought on the scenarios in which the walreceiver process
> >> waits, and what happens in case the latch is set when SIGHUP is
> >> received.
> >
> > The difference is whether the config file is processed at the next
> > wakeup (by force-reply-request or SIGTERM) of walreceiver or
> > immediately. If server-reload happened frequently, say, several times
> > per second(?), we should consider to reduce the useless reloading, but
> > actually that's not the case.
>
> So, attached is the patch that makes walreceiver use both standard
> SIGTERM and SIGHUP handlers. Currently I've not found any actual
> issues by making walreceiver use standard SIGHUP handler, yet.
>

I think it makes sense to replace WalRcv->latch with
MyProc->procLatch(as they point to the same latch) in the functions
that are called in the walreceiver process. However, we are using
walrcv->latch with spinlock, say in WalRcvForceReply() and
RequestXLogStreaming() both are called from the startup process. See
commit 45f9d08684, that made the access to the walreceiver's
latch(WalRcv->latch) by the other process(startup) spinlock protected

And looks like, in general it's a standard practice to set latch to
wake up the process if waiting in case a SIGHUP signal is received and
reload the relevant config variables.

Going by the above analysis, the v3 patch looks good to me.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Re: Use standard SIGHUP and SIGTERM handlers in autoprewarm module

From

Bharath Rupireddy

Date:

10 November 2020, 13:11:12

Hi,

Attaching a patch that replaces custom signal handlers for SIGHUP and
SIGTERM in worker_spi.c.

Thoughts?


With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachment

v1-0001-Use-standard-SIGHUP-SIGTERM-handlers-in-worker_sp.patch

Re: Use standard SIGHUP and SIGTERM handlers in autoprewarm module

From

Fujii Masao

Date:

12 November 2020, 04:36:07


On 2020/11/10 21:30, Bharath Rupireddy wrote:
> On Tue, Nov 10, 2020 at 3:04 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>>
>>>> The main reason for having SetLatch() in
>>>> SignalHandlerForConfigReload() is to wake up the calling process if
>>>> waiting in WaitLatchOrSocket() or WaitLatch() and reload the new
>>>> config file and use the reloaded config variables. Maybe we should
>>>> give a thought on the scenarios in which the walreceiver process
>>>> waits, and what happens in case the latch is set when SIGHUP is
>>>> received.
>>>
>>> The difference is whether the config file is processed at the next
>>> wakeup (by force-reply-request or SIGTERM) of walreceiver or
>>> immediately. If server-reload happened frequently, say, several times
>>> per second(?), we should consider to reduce the useless reloading, but
>>> actually that's not the case.
>>
>> So, attached is the patch that makes walreceiver use both standard
>> SIGTERM and SIGHUP handlers. Currently I've not found any actual
>> issues by making walreceiver use standard SIGHUP handler, yet.
>>
> 
> I think it makes sense to replace WalRcv->latch with
> MyProc->procLatch(as they point to the same latch) in the functions
> that are called in the walreceiver process. However, we are using
> walrcv->latch with spinlock, say in WalRcvForceReply() and
> RequestXLogStreaming() both are called from the startup process. See
> commit 45f9d08684, that made the access to the walreceiver's
> latch(WalRcv->latch) by the other process(startup) spinlock protected
> 
> And looks like, in general it's a standard practice to set latch to
> wake up the process if waiting in case a SIGHUP signal is received and
> reload the relevant config variables.
> 
> Going by the above analysis, the v3 patch looks good to me.

Thanks for the analysis! I pushed the patch.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: Use standard SIGHUP and SIGTERM handlers in autoprewarm module

From

Bharath Rupireddy

Date:

13 November 2020, 11:24:28

On Thu, Nov 12, 2020 at 10:06 AM Fujii Masao
<masao.fujii@oss.nttdata.com> wrote:
>
> Thanks for the analysis! I pushed the patch.
>

Thanks! Since we are replacing custom SIGHUP and SIGTERM handlers with
standard ones, how about doing the same thing in worker_spi.c? I
posted a patch previously [1] in this mail thread. If it makes sense,
please review it.

[1] - https://www.postgresql.org/message-id/CALj2ACXDEZhAFOTDcqO9cFSRvrFLyYOnPKrcA1UG4uZn9hUAVg%40mail.gmail.com

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Re: Use standard SIGHUP and SIGTERM handlers in autoprewarm module

From

Fujii Masao

Date:

17 November 2020, 08:16:52

On 2020/11/13 20:24, Bharath Rupireddy wrote:
> On Thu, Nov 12, 2020 at 10:06 AM Fujii Masao
> <masao.fujii@oss.nttdata.com> wrote:
>>
>> Thanks for the analysis! I pushed the patch.
>>
> 
> Thanks! Since we are replacing custom SIGHUP and SIGTERM handlers with
> standard ones, how about doing the same thing in worker_spi.c? I
> posted a patch previously [1] in this mail thread. If it makes sense,
> please review it.

I agree to simplify the worker_spi code by making it use the standard
signal handlers. But as far as I read Craig Ringer's comments upthread
about worker_spi, it's not enough to just replace the dedicated SIGTERM
handler with the standard one. ISTM that probably worker_spi should
use the signal handler handling InterruptPending and ProcDiePending
like die() does. What do you think about Craig Ringer's comments?

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: Use standard SIGHUP and SIGTERM handlers in autoprewarm module

From

Bharath Rupireddy

Date:

17 November 2020, 12:18:38

Thanks Craig!

On Fri, Oct 23, 2020 at 9:37 AM Craig Ringer
<craig.ringer@enterprisedb.com> wrote:
>
> src/test/modules/test_shm_mq/worker.c appears to do the right thing the wrong way - it has its own custom handler
insteadof using die() or SignalHandlerForShutdownRequest(). 
>

handle_sigterm() and die() are similar except that die() has extra
handling(below) for exiting immediately when waiting for input/command
from the client.
    /*
     * If we're in single user mode, we want to quit immediately - we can't
     * rely on latches as they wouldn't work when stdin/stdout is a file.
     * Rather ugly, but it's unlikely to be worthwhile to invest much more
     * effort just for the benefit of single user mode.
     */
    if (DoingCommandRead && whereToSendOutput != DestRemote)
        ProcessInterrupts();

Having this extra handling is correct for normal backends as they can
connect directly to clients for reading input commands, the above if
condition may become true and ProcessInterrupts() may be called to
handle the SIGTERM immediately.

Note that DoingCommandRead can never be true in bg workers as it's
being set to true only in normal backend PostgresMain(). If we use
die()(instead of custom SIGTERM handlers such as handle_sigterm()) in
a bg worker/non-backend process, there are no chances that the above
part of the code gets hit and the interrupts are processed
immediately. And also here are the bg worker process that use die()
instead of their own custom handlers: parallel workers, autoprewarm
worker, autovacuum worker, logical replication launcher and apply
worker, wal sender process

I think we can also use die() instead of handle_sigterm() in
test_shm_mq/worker.c.

I attached a patch for this change.

>
> In contrast  src/test/modules/worker_spi/worker_spi.c looks plain wrong. Especially since it's quoted as an example
ofhow to do things right. It won't respond to SIGTERM at all while it's executing a query from its queue, no matter how
longthat query takes or whether it blocks. It can inhibit even postmaster shutdown as a result. 
>

Postmaster sends SIGTERM to all children(backends and bgworkers) for
both smart shutdown(wait for children to end their work, then shut
down.) and fast shutdown(rollback active transactions and shutdown
when they are gone.) For immediate shutdown SIGQUIT is sent to
children.(see pmdie()).

For each bg worker(so is for worker_spi.c),
SignalHandlerForCrashExit() is set as SIGQUIT handler in
InitPostmasterChild(), which does nothing but exits immediately. We
can not use quickdie() here because as a bg worker we don't have
to/can not send anything to client. We are good with
SignalHandlerForCrashExit() as with all other bg workers.

Yes, if having worker_spi_sigterm/SignalHandlerForShutdownRequest,
sometimes(as explained above) the worker_spi worker may not respond to
SIGTERM. I think we should be having die() as SIGTERM handler here (as
with normal backend and parallel workers).

Attaching another patch that has replaced custom SIGTERM handler
worker_spi_sigterm() with die() and custom SIGHUP handler
worker_spi_sighup() with standard SignalHandlerForConfigReload().

>
> I was going to lob off a quick patch to fix this by making both use quickdie() for SIGQUIT and die() for SIGTERM, but
afterreading for a bit I'm no longer sure what the right choice even is. I'd welcome some opinions. 
>

We can not use quickdie() here because as a bg worker we don't have
to/can not send anything to client. We are good with
SignalHandlerForCrashExit() as with all other bg workers.

Thoughts?

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: Use standard SIGHUP and SIGTERM handlers in autoprewarm module

From

Fujii Masao

Date:

18 November 2020, 12:22:50


On 2020/11/17 21:18, Bharath Rupireddy wrote:
> Thanks Craig!
> 
> On Fri, Oct 23, 2020 at 9:37 AM Craig Ringer
> <craig.ringer@enterprisedb.com> wrote:
>>
>> src/test/modules/test_shm_mq/worker.c appears to do the right thing the wrong way - it has its own custom handler
insteadof using die() or SignalHandlerForShutdownRequest().
 
>>
> 
> handle_sigterm() and die() are similar except that die() has extra
> handling(below) for exiting immediately when waiting for input/command
> from the client.
>      /*
>       * If we're in single user mode, we want to quit immediately - we can't
>       * rely on latches as they wouldn't work when stdin/stdout is a file.
>       * Rather ugly, but it's unlikely to be worthwhile to invest much more
>       * effort just for the benefit of single user mode.
>       */
>      if (DoingCommandRead && whereToSendOutput != DestRemote)
>          ProcessInterrupts();
> 
> Having this extra handling is correct for normal backends as they can
> connect directly to clients for reading input commands, the above if
> condition may become true and ProcessInterrupts() may be called to
> handle the SIGTERM immediately.
> 
> Note that DoingCommandRead can never be true in bg workers as it's
> being set to true only in normal backend PostgresMain(). If we use
> die()(instead of custom SIGTERM handlers such as handle_sigterm()) in
> a bg worker/non-backend process, there are no chances that the above
> part of the code gets hit and the interrupts are processed
> immediately. And also here are the bg worker process that use die()
> instead of their own custom handlers: parallel workers, autoprewarm
> worker, autovacuum worker, logical replication launcher and apply
> worker, wal sender process
> 
> I think we can also use die() instead of handle_sigterm() in
> test_shm_mq/worker.c.
> 
> I attached a patch for this change.

Thanks for the patch! It looks good to me.

BTW, I tried to find the past discussion about why die() was not used,
but I could not yet.


> 
>>
>> In contrast  src/test/modules/worker_spi/worker_spi.c looks plain wrong. Especially since it's quoted as an example
ofhow to do things right. It won't respond to SIGTERM at all while it's executing a query from its queue, no matter how
longthat query takes or whether it blocks. It can inhibit even postmaster shutdown as a result.
 
>>
> 
> Postmaster sends SIGTERM to all children(backends and bgworkers) for
> both smart shutdown(wait for children to end their work, then shut
> down.) and fast shutdown(rollback active transactions and shutdown
> when they are gone.) For immediate shutdown SIGQUIT is sent to
> children.(see pmdie()).
> 
> For each bg worker(so is for worker_spi.c),
> SignalHandlerForCrashExit() is set as SIGQUIT handler in
> InitPostmasterChild(), which does nothing but exits immediately. We
> can not use quickdie() here because as a bg worker we don't have
> to/can not send anything to client. We are good with
> SignalHandlerForCrashExit() as with all other bg workers.
> 
> Yes, if having worker_spi_sigterm/SignalHandlerForShutdownRequest,
> sometimes(as explained above) the worker_spi worker may not respond to
> SIGTERM. I think we should be having die() as SIGTERM handler here (as
> with normal backend and parallel workers).
> 
> Attaching another patch that has replaced custom SIGTERM handler
> worker_spi_sigterm() with die() and custom SIGHUP handler
> worker_spi_sighup() with standard SignalHandlerForConfigReload().

This patch also looks good to me.


> 
>>
>> I was going to lob off a quick patch to fix this by making both use quickdie() for SIGQUIT and die() for SIGTERM,
butafter reading for a bit I'm no longer sure what the right choice even is. I'd welcome some opinions.
 
>>
> 
> We can not use quickdie() here because as a bg worker we don't have
> to/can not send anything to client. We are good with
> SignalHandlerForCrashExit() as with all other bg workers.
> 
> Thoughts?

I think you're right.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: Use standard SIGHUP and SIGTERM handlers in autoprewarm module

From

Bharath Rupireddy

Date:

20 November 2020, 10:33:50

On Wed, Nov 18, 2020 at 5:52 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>
> > handle_sigterm() and die() are similar except that die() has extra
> > handling(below) for exiting immediately when waiting for input/command
> > from the client.
> > /*
> > * If we're in single user mode, we want to quit immediately - we can't
> > * rely on latches as they wouldn't work when stdin/stdout is a file.
> > * Rather ugly, but it's unlikely to be worthwhile to invest much more
> > * effort just for the benefit of single user mode.
> > */
> > if (DoingCommandRead && whereToSendOutput != DestRemote)
> > ProcessInterrupts();
> >
> > Having this extra handling is correct for normal backends as they can
> > connect directly to clients for reading input commands, the above if
> > condition may become true and ProcessInterrupts() may be called to
> > handle the SIGTERM immediately.
> >
> > Note that DoingCommandRead can never be true in bg workers as it's
> > being set to true only in normal backend PostgresMain(). If we use
> > die()(instead of custom SIGTERM handlers such as handle_sigterm()) in
> > a bg worker/non-backend process, there are no chances that the above
> > part of the code gets hit and the interrupts are processed
> > immediately. And also here are the bg worker process that use die()
> > instead of their own custom handlers: parallel workers, autoprewarm
> > worker, autovacuum worker, logical replication launcher and apply
> > worker, wal sender process
> >
> > I think we can also use die() instead of handle_sigterm() in
> > test_shm_mq/worker.c.
> >
> > I attached a patch for this change.
>
> Thanks for the patch! It looks good to me.
>

Thanks.

>
> BTW, I tried to find the past discussion about why die() was not used,
> but I could not yet.
>

I think the commit 2505ce0 that altered the comment has something: handle_sigterm() is stripped down to remove if (DoingCommandRead && whereToSendOutput != DestRemote) part from die() since test_shm_mq bg worker or for that matter any bg worker do not have an equivalent of the regular backend's command-read loop.

--- a/src/test/modules/test_shm_mq/worker.c
+++ b/src/test/modules/test_shm_mq/worker.c
@@ -60,13 +60,9 @@ test_shm_mq_main(Datum main_arg)
*
* We want CHECK_FOR_INTERRUPTS() to kill off this worker process just as
* it would a normal user backend. To make that happen, we establish a
- * signal handler that is a stripped-down version of die(). We don't have
- * any equivalent of the backend's command-read loop, where interrupts can
- * be processed immediately, so make sure ImmediateInterruptOK is turned
- * off.
+ * signal handler that is a stripped-down version of die().
*/
pqsignal(SIGTERM, handle_sigterm);
- ImmediateInterruptOK = false;

BackgroundWorkerUnblockSignals();

>
> > Attaching another patch that has replaced custom SIGTERM handler
> > worker_spi_sigterm() with die() and custom SIGHUP handler
> > worker_spi_sighup() with standard SignalHandlerForConfigReload().
>
> This patch also looks good to me.
>

Thanks.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Re: Use standard SIGHUP and SIGTERM handlers in autoprewarm module

From

Fujii Masao

Date:

25 November 2020, 09:59:25

On 2020/11/20 19:33, Bharath Rupireddy wrote:
> On Wed, Nov 18, 2020 at 5:52 PM Fujii Masao <masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>>
wrote:
>  >
>  > > handle_sigterm() and die() are similar except that die() has extra
>  > > handling(below) for exiting immediately when waiting for input/command
>  > > from the client.
>  > >      /*
>  > >       * If we're in single user mode, we want to quit immediately - we can't
>  > >       * rely on latches as they wouldn't work when stdin/stdout is a file.
>  > >       * Rather ugly, but it's unlikely to be worthwhile to invest much more
>  > >       * effort just for the benefit of single user mode.
>  > >       */
>  > >      if (DoingCommandRead && whereToSendOutput != DestRemote)
>  > >          ProcessInterrupts();
>  > >
>  > > Having this extra handling is correct for normal backends as they can
>  > > connect directly to clients for reading input commands, the above if
>  > > condition may become true and ProcessInterrupts() may be called to
>  > > handle the SIGTERM immediately.
>  > >
>  > > Note that DoingCommandRead can never be true in bg workers as it's
>  > > being set to true only in normal backend PostgresMain(). If we use
>  > > die()(instead of custom SIGTERM handlers such as handle_sigterm()) in
>  > > a bg worker/non-backend process, there are no chances that the above
>  > > part of the code gets hit and the interrupts are processed
>  > > immediately. And also here are the bg worker process that use die()
>  > > instead of their own custom handlers: parallel workers, autoprewarm
>  > > worker, autovacuum worker, logical replication launcher and apply
>  > > worker, wal sender process
>  > >
>  > > I think we can also use die() instead of handle_sigterm() in
>  > > test_shm_mq/worker.c.
>  > >
>  > > I attached a patch for this change.
>  >
>  > Thanks for the patch! It looks good to me.
>  >
> 
> Thanks.

When I read the patch again, I found that, with the patch, the shutdown
of worker_spi causes to report the following FATAL message.

     FATAL:  terminating connection due to administrator command

Isn't this message confusing because it's not a connection? If so,
we need to update ProcessInterrupts() so that the proper message is
reported like other bgworkers do.

BTW, though this is not directly related to this topic, it's strange to me
that the normal shutdown of bgworker causes to emit FATAL message
and the message "background worker ... exited with exit code 1"
at the normal shutdown. I've heard sometimes that users coplained that
it's confusing to report that message for logical replication launcher
at the normal shutdown. Maybe the mechanism to shutdown bgworker
normally seems to have not been implemented well yet.

Without the patch, since worker_spi cannot handle the shutdown request
promptly while it's executing the backend code, maybe we can say this is
a bug. But worker_spi is a just example of the code, I'm not thinking to
do back-patch for now. Thought?

> 
>  >
>  > BTW, I tried to find the past discussion about why die() was not used,
>  > but I could not yet.
>  >
> 
> I think the commit 2505ce0 that altered the comment has something:

Thanks for the info! Will check that.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: Use standard SIGHUP and SIGTERM handlers in autoprewarm module

From

Bharath Rupireddy

Date:

25 November 2020, 14:38:57

On Wed, Nov 25, 2020 at 3:29 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>
> On 2020/11/20 19:33, Bharath Rupireddy wrote:
> > On Wed, Nov 18, 2020 at 5:52 PM Fujii Masao <masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>> wrote:
> > >
> > > > handle_sigterm() and die() are similar except that die() has extra
> > > > handling(below) for exiting immediately when waiting for input/command
> > > > from the client.
> > > > /*
> > > > * If we're in single user mode, we want to quit immediately - we can't
> > > > * rely on latches as they wouldn't work when stdin/stdout is a file.
> > > > * Rather ugly, but it's unlikely to be worthwhile to invest much more
> > > > * effort just for the benefit of single user mode.
> > > > */
> > > > if (DoingCommandRead && whereToSendOutput != DestRemote)
> > > > ProcessInterrupts();
> > > >
> > > > Having this extra handling is correct for normal backends as they can
> > > > connect directly to clients for reading input commands, the above if
> > > > condition may become true and ProcessInterrupts() may be called to
> > > > handle the SIGTERM immediately.
> > > >
> > > > Note that DoingCommandRead can never be true in bg workers as it's
> > > > being set to true only in normal backend PostgresMain(). If we use
> > > > die()(instead of custom SIGTERM handlers such as handle_sigterm()) in
> > > > a bg worker/non-backend process, there are no chances that the above
> > > > part of the code gets hit and the interrupts are processed
> > > > immediately. And also here are the bg worker process that use die()
> > > > instead of their own custom handlers: parallel workers, autoprewarm
> > > > worker, autovacuum worker, logical replication launcher and apply
> > > > worker, wal sender process
> > > >
> > > > I think we can also use die() instead of handle_sigterm() in
> > > > test_shm_mq/worker.c.
> > > >
> > > > I attached a patch for this change.
> > >
> > > Thanks for the patch! It looks good to me.
> > >
> >
> > Thanks.
>
> When I read the patch again, I found that, with the patch, the shutdown
> of worker_spi causes to report the following FATAL message.
>
> FATAL: terminating connection due to administrator command
>
> Isn't this message confusing because it's not a connection? If so,
> we need to update ProcessInterrupts() so that the proper message is
> reported like other bgworkers do.
>

This is also true for all the bgworker that use the die() handler. How about doing it the way bgworker_die() does in ProcessInterrupts()? This would give meaningful information. Thoughts? If okay, I can make a separate patch.

else if (RecoveryConflictPending)
{
/* Currently there is only one non-retryable recovery conflict */
Assert(RecoveryConflictReason == PROCSIG_RECOVERY_CONFLICT_DATABASE);
pgstat_report_recovery_conflict(RecoveryConflictReason);
ereport(FATAL,
(errcode(ERRCODE_DATABASE_DROPPED),
errmsg("terminating connection due to conflict with recovery"),
errdetail_recovery_conflict()));
}
else if (IsBackgroundWorker)
{
ereport(FATAL,
(errcode(ERRCODE_ADMIN_SHUTDOWN),
errmsg("terminating background worker \"%s\" due to administrator command",
MyBgworkerEntry->bgw_type)));
}
else
ereport(FATAL,
(errcode(ERRCODE_ADMIN_SHUTDOWN),
errmsg("terminating connection due to administrator command")));

>
> BTW, though this is not directly related to this topic, it's strange to me
> that the normal shutdown of bgworker causes to emit FATAL message
> and the message "background worker ... exited with exit code 1"
> at the normal shutdown. I've heard sometimes that users coplained that
> it's confusing to report that message for logical replication launcher
> at the normal shutdown. Maybe the mechanism to shutdown bgworker
> normally seems to have not been implemented well yet.

What do you mean by normal shutdown of bgworker? Is it that bgworker has exited successfully with exit code 0 or for some reason with exit code other than 0? Or is it when the postmaster is shutdown normally?

IIUC, when a bgworker exists either normally with an exit code 0 or other than 0, then CleanupBackgroundWorker() is called in postmaster, a message(like below) is prepared, and the LogChildExit() is called with either DEBUG1 or LOG level and for instance the message you specified gets printed "background worker ... exited with exit code 1". I have not seen a FATAL message similar to "background worker ... exited with exit code 1" at the normal shutdown.

snprintf(namebuf, MAXPGPATH, _("background worker \"%s\""), rw->rw_worker.bgw_type);

LogChildExit(EXIT_STATUS_0(exitstatus) ? DEBUG1 : LOG, namebuf, pid, exitstatus);

Am I missing something?

If my analysis is right, then for instance, when a logical replication launcher is exited, it logs "background worker "logical replication launcher" exited with exit code X" with either DEBUG1 or LOG level but not with FATAL level.

>
> Without the patch, since worker_spi cannot handle the shutdown request
> promptly while it's executing the backend code, maybe we can say this is
> a bug. But worker_spi is a just example of the code, I'm not thinking to
> do back-patch for now. Thought?

+1 to not backpatch it.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Re: Use standard SIGHUP and SIGTERM handlers in autoprewarm module

From

Fujii Masao

Date:

26 November 2020, 14:07:42


On 2020/11/25 23:38, Bharath Rupireddy wrote:
> On Wed, Nov 25, 2020 at 3:29 PM Fujii Masao <masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>>
wrote:
>  >
>  > On 2020/11/20 19:33, Bharath Rupireddy wrote:
>  > > On Wed, Nov 18, 2020 at 5:52 PM Fujii Masao <masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>
<mailto:masao.fujii@oss.nttdata.com<mailto:masao.fujii@oss.nttdata.com>>> wrote:
 
>  > >  >
>  > >  > > handle_sigterm() and die() are similar except that die() has extra
>  > >  > > handling(below) for exiting immediately when waiting for input/command
>  > >  > > from the client.
>  > >  > >      /*
>  > >  > >       * If we're in single user mode, we want to quit immediately - we can't
>  > >  > >       * rely on latches as they wouldn't work when stdin/stdout is a file.
>  > >  > >       * Rather ugly, but it's unlikely to be worthwhile to invest much more
>  > >  > >       * effort just for the benefit of single user mode.
>  > >  > >       */
>  > >  > >      if (DoingCommandRead && whereToSendOutput != DestRemote)
>  > >  > >          ProcessInterrupts();
>  > >  > >
>  > >  > > Having this extra handling is correct for normal backends as they can
>  > >  > > connect directly to clients for reading input commands, the above if
>  > >  > > condition may become true and ProcessInterrupts() may be called to
>  > >  > > handle the SIGTERM immediately.
>  > >  > >
>  > >  > > Note that DoingCommandRead can never be true in bg workers as it's
>  > >  > > being set to true only in normal backend PostgresMain(). If we use
>  > >  > > die()(instead of custom SIGTERM handlers such as handle_sigterm()) in
>  > >  > > a bg worker/non-backend process, there are no chances that the above
>  > >  > > part of the code gets hit and the interrupts are processed
>  > >  > > immediately. And also here are the bg worker process that use die()
>  > >  > > instead of their own custom handlers: parallel workers, autoprewarm
>  > >  > > worker, autovacuum worker, logical replication launcher and apply
>  > >  > > worker, wal sender process
>  > >  > >
>  > >  > > I think we can also use die() instead of handle_sigterm() in
>  > >  > > test_shm_mq/worker.c.
>  > >  > >
>  > >  > > I attached a patch for this change.
>  > >  >
>  > >  > Thanks for the patch! It looks good to me.
>  > >  >
>  > >
>  > > Thanks.
>  >
>  > When I read the patch again, I found that, with the patch, the shutdown
>  > of worker_spi causes to report the following FATAL message.
>  >
>  >      FATAL:  terminating connection due to administrator command
>  >
>  > Isn't this message confusing because it's not a connection? If so,
>  > we need to update ProcessInterrupts() so that the proper message is
>  > reported like other bgworkers do.
>  >
> 
> This is also true for all the bgworker that use the die() handler. How about doing it the way bgworker_die() does in
ProcessInterrupts()?This would give meaningful information. Thoughts? If okay, I can make a separate patch.
 

+1
Thanks!


> 
>          else if (RecoveryConflictPending)
>          {
>              /* Currently there is only one non-retryable recovery conflict */
>              Assert(RecoveryConflictReason == PROCSIG_RECOVERY_CONFLICT_DATABASE);
>              pgstat_report_recovery_conflict(RecoveryConflictReason);
>              ereport(FATAL,
>                      (errcode(ERRCODE_DATABASE_DROPPED),
>                       errmsg("terminating connection due to conflict with recovery"),
>                       errdetail_recovery_conflict()));
>          }
> *       else if (IsBackgroundWorker)
>         {
>                 ereport(FATAL,
>                             (errcode(ERRCODE_ADMIN_SHUTDOWN),
>                              errmsg("terminating background worker \"%s\" due to administrator command",
>                              MyBgworkerEntry->bgw_type)));
>         }*
>          else
>              ereport(FATAL,
>                      (errcode(ERRCODE_ADMIN_SHUTDOWN),
>                       errmsg("terminating connection due to administrator command")));
> 
>  >
>  > BTW, though this is not directly related to this topic, it's strange to me
>  > that the normal shutdown of bgworker causes to emit FATAL message
>  > and the message "background worker ... exited with exit code 1"
>  > at the normal shutdown. I've heard sometimes that users coplained that
>  > it's confusing to report that message for logical replication launcher
>  > at the normal shutdown. Maybe the mechanism to shutdown bgworker
>  > normally seems to have not been implemented well yet.
>  >
> 
> What do you mean by normal shutdown of bgworker? Is it that bgworker has exited successfully with exit code 0 or for
somereason with exit code other than 0? Or is it when the postmaster is shutdown normally?
 
> 
> IIUC, when a bgworker exists either normally with an exit code 0 or other than 0, then CleanupBackgroundWorker() is
calledin postmaster, a message(like below) is prepared, and the LogChildExit() is called with either DEBUG1 or LOG
leveland for instance the message you specified gets printed "background worker ... exited with exit code 1". I have
notseen a FATAL message similar to "background worker ... exited with exit code 1" at the normal shutdown.
 
> 
> snprintf(namebuf, MAXPGPATH, _("background worker \"%s\""), rw->rw_worker.bgw_type);
> 
> LogChildExit(EXIT_STATUS_0(exitstatus) ? DEBUG1 : LOG, namebuf, pid, exitstatus);
> 
> Am I missing something?
> 
> If my analysis is right, then for instance, when a logical replication launcher is exited, it logs "background worker
"logicalreplication launcher" exited with exit code X" with either DEBUG1 or LOG level but not with FATAL level.
 

Yes, it's not with FATAL level. But that message looks like that it's
reporting error message. This is why we sometimes received
the complaints (e.g., [1][2]) about that message.

[1]
https://postgr.es/m/CAKFQuwZHcZnnwoFaXX2YKNT3KhaNr_+vd95Oo=f045zfn7Tetw@mail.gmail.com

[2]
https://postgr.es/m/CAEepm=1c3hG1g3iKYwfa_PDsT49RBaBJsaot_qNhPSCXBm9rzA@mail.gmail.com

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: Use standard SIGHUP and SIGTERM handlers in autoprewarm module

From

Bharath Rupireddy

Date:

26 November 2020, 15:15:29

On Thu, Nov 26, 2020 at 7:37 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>
> > What do you mean by normal shutdown of bgworker? Is it that bgworker has exited successfully with exit code 0 or
forsome reason with exit code other than 0? Or is it when the postmaster is shutdown normally? 
> >
> > IIUC, when a bgworker exists either normally with an exit code 0 or other than 0, then CleanupBackgroundWorker() is
calledin postmaster, a message(like below) is prepared, and the LogChildExit() is called with either DEBUG1 or LOG
leveland for instance the message you specified gets printed "background worker ... exited with exit code 1". I have
notseen a FATAL message similar to "background worker ... exited with exit code 1" at the normal shutdown. 
> >
> > snprintf(namebuf, MAXPGPATH, _("background worker \"%s\""), rw->rw_worker.bgw_type);
> >
> > LogChildExit(EXIT_STATUS_0(exitstatus) ? DEBUG1 : LOG, namebuf, pid, exitstatus);
> >
> > Am I missing something?
> >
> > If my analysis is right, then for instance, when a logical replication launcher is exited, it logs "background
worker"logical replication launcher" exited with exit code X" with either DEBUG1 or LOG level but not with FATAL level. 
>
> Yes, it's not with FATAL level. But that message looks like that it's
> reporting error message. This is why we sometimes received
> the complaints (e.g., [1][2]) about that message.
>

Oh. Should we do something about it now? Any thoughts? I have not read
fully through the threads specified, I will go through them later.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Re: Use standard SIGHUP and SIGTERM handlers in autoprewarm module

From

Fujii Masao

Date:

26 November 2020, 15:23:23


On 2020/11/27 0:15, Bharath Rupireddy wrote:
> On Thu, Nov 26, 2020 at 7:37 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>>
>>> What do you mean by normal shutdown of bgworker? Is it that bgworker has exited successfully with exit code 0 or
forsome reason with exit code other than 0? Or is it when the postmaster is shutdown normally?
 
>>>
>>> IIUC, when a bgworker exists either normally with an exit code 0 or other than 0, then CleanupBackgroundWorker() is
calledin postmaster, a message(like below) is prepared, and the LogChildExit() is called with either DEBUG1 or LOG
leveland for instance the message you specified gets printed "background worker ... exited with exit code 1". I have
notseen a FATAL message similar to "background worker ... exited with exit code 1" at the normal shutdown.
 
>>>
>>> snprintf(namebuf, MAXPGPATH, _("background worker \"%s\""), rw->rw_worker.bgw_type);
>>>
>>> LogChildExit(EXIT_STATUS_0(exitstatus) ? DEBUG1 : LOG, namebuf, pid, exitstatus);
>>>
>>> Am I missing something?
>>>
>>> If my analysis is right, then for instance, when a logical replication launcher is exited, it logs "background
worker"logical replication launcher" exited with exit code X" with either DEBUG1 or LOG level but not with FATAL
level.
>>
>> Yes, it's not with FATAL level. But that message looks like that it's
>> reporting error message. This is why we sometimes received
>> the complaints (e.g., [1][2]) about that message.
>>
> 
> Oh. Should we do something about it now?

No. This is not directly related to the issue that we are discussing
as I told upthread. Of course, it's better to work on this if we can
easily fix it. But seems not... So please ignore this my comment.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: Use standard SIGHUP and SIGTERM handlers in autoprewarm module

From

Bharath Rupireddy

Date:

27 November 2020, 03:13:34

On Wed, Nov 25, 2020 at 8:08 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote:
>
> > When I read the patch again, I found that, with the patch, the shutdown
> > of worker_spi causes to report the following FATAL message.
> >
> > FATAL: terminating connection due to administrator command
> >
> > Isn't this message confusing because it's not a connection? If so,
> > we need to update ProcessInterrupts() so that the proper message is
> > reported like other bgworkers do.
> >
>
> This is also true for all the bgworker that use the die() handler. How about doing it the way bgworker_die() does in ProcessInterrupts()? This would give meaningful information. Thoughts? If okay, I can make a separate patch.
>

Attaching the patch that improved the message for bg workers in ProcessInterrupts(). For instance, now it looks like FATAL: terminating background worker "worker_spi" due to administrator command or FATAL: terminating background worker "parallel worker" due to administrator command and so on for other bg workers.

Please review the patch.

I'm also mentioning the 2 previous patches posted in [1]. One of the patch is for using die() instead of handle_sigterm() in test_shm_mq/worker.c and another is for replacing custom SIGTERM handler worker_spi_sigterm() with die() and custom SIGHUP handler worker_spi_sighup() with standard SignalHandlerForConfigReload()

[1] https://www.postgresql.org/message-id/CALj2ACWWy1YcngpCUn09AsXMfOzwjfNqbVosfoRY0vhhVWhVBw%40mail.gmail.com

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachment

v1-0001-Improve-error-message-in-ProcessInterrupts-for-bg-worker.patch

Re: Use standard SIGHUP and SIGTERM handlers in autoprewarm module

From

Fujii Masao

Date:

27 November 2020, 06:56:52


On 2020/11/27 12:13, Bharath Rupireddy wrote:
> On Wed, Nov 25, 2020 at 8:08 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com
<mailto:bharath.rupireddyforpostgres@gmail.com>>wrote:
 
>  >
>  > > When I read the patch again, I found that, with the patch, the shutdown
>  > > of worker_spi causes to report the following FATAL message.
>  > >
>  > >      FATAL:  terminating connection due to administrator command
>  > >
>  > > Isn't this message confusing because it's not a connection? If so,
>  > > we need to update ProcessInterrupts() so that the proper message is
>  > > reported like other bgworkers do.
>  > >
>  >
>  > This is also true for all the bgworker that use the die() handler. How about doing it the way bgworker_die() does
inProcessInterrupts()? This would give meaningful information. Thoughts? If okay, I can make a separate patch.
 
>  >
> 
> Attaching the patch that improved the message for bg workers in ProcessInterrupts(). For instance, now it looks like
*FATAL: terminating background worker "worker_spi" due to administrator command* or *FATAL:  terminating background
worker"parallel worker" due to administrator command *and so on for other bg workers.*
 
> *
> 
> Please review the patch.

Thanks for the patch! It looks good to me.


> 
> I'm also mentioning the 2 previous patches posted in [1]. One of the patch is for using die() instead of
handle_sigterm()in test_shm_mq/worker.c and another is for replacing custom SIGTERM handler worker_spi_sigterm() with
die()and custom SIGHUP handler worker_spi_sighup() with standard SignalHandlerForConfigReload()
 

Yeah, I pushed them. Thanks!

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: Use standard SIGHUP and SIGTERM handlers in autoprewarm module

From

Bharath Rupireddy

Date:

27 November 2020, 07:47:38

On Fri, Nov 27, 2020 at 12:26 PM Fujii Masao
<masao.fujii@oss.nttdata.com> wrote:
>
> >  > > When I read the patch again, I found that, with the patch, the shutdown
> >  > > of worker_spi causes to report the following FATAL message.
> >  > >
> >  > >      FATAL:  terminating connection due to administrator command
> >  > >
> >  > > Isn't this message confusing because it's not a connection? If so,
> >  > > we need to update ProcessInterrupts() so that the proper message is
> >  > > reported like other bgworkers do.
> >  > >
> >  >
> >  > This is also true for all the bgworker that use the die() handler. How about doing it the way bgworker_die()
doesin ProcessInterrupts()? This would give meaningful information. Thoughts? If okay, I can make a separate patch. 
> >  >
> >
> > Attaching the patch that improved the message for bg workers in ProcessInterrupts(). For instance, now it looks
like*FATAL:  terminating background worker "worker_spi" due to administrator command* or *FATAL:  terminating
backgroundworker "parallel worker" due to administrator command *and so on for other bg workers.* 
> > *
> >
> > Please review the patch.
>
> Thanks for the patch! It looks good to me.
>

Thanks!

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Re: Use standard SIGHUP and SIGTERM handlers in autoprewarm module

From

Fujii Masao

Date:

30 November 2020, 02:12:13


On 2020/11/27 16:47, Bharath Rupireddy wrote:
> On Fri, Nov 27, 2020 at 12:26 PM Fujii Masao
> <masao.fujii@oss.nttdata.com> wrote:
>>
>>>   > > When I read the patch again, I found that, with the patch, the shutdown
>>>   > > of worker_spi causes to report the following FATAL message.
>>>   > >
>>>   > >      FATAL:  terminating connection due to administrator command
>>>   > >
>>>   > > Isn't this message confusing because it's not a connection? If so,
>>>   > > we need to update ProcessInterrupts() so that the proper message is
>>>   > > reported like other bgworkers do.
>>>   > >
>>>   >
>>>   > This is also true for all the bgworker that use the die() handler. How about doing it the way bgworker_die()
doesin ProcessInterrupts()? This would give meaningful information. Thoughts? If okay, I can make a separate patch.
 
>>>   >
>>>
>>> Attaching the patch that improved the message for bg workers in ProcessInterrupts(). For instance, now it looks
like*FATAL:  terminating background worker "worker_spi" due to administrator command* or *FATAL:  terminating
backgroundworker "parallel worker" due to administrator command *and so on for other bg workers.*
 
>>> *
>>>
>>> Please review the patch.
>>
>> Thanks for the patch! It looks good to me.
>>
> 
> Thanks!

Pushed. Thanks!

Since all the patches that proposed in this thread were committed,
I marked the CF entry as committed.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: Use standard SIGHUP and SIGTERM handlers in autoprewarm module

From

Fujii Masao

Date:

15 December 2020, 14:10:28

On 2020/11/04 18:06, Fujii Masao wrote:
> 
> 
> On 2020/10/29 15:21, Fujii Masao wrote:
>>
>>
>> On 2020/10/07 11:30, Bharath Rupireddy wrote:
>>> On Tue, Oct 6, 2020 at 11:41 AM Bharath Rupireddy
>>> <bharath.rupireddyforpostgres@gmail.com> wrote:
>>>>
>>>> On Tue, Oct 6, 2020 at 11:20 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>>>>>
>>>>> +1 Or it's also ok to make each patch separately.
>>>>> Anyway, thanks!
>>>>>
>>>>
>>>> Thanks. +1 to have separate patches. I will post two separate patches
>>>> for autoprewarm and WalRcvShutdownHandler for further review. The
>>>> other two patches can be for startup.c and syslogger.c.
>>>>
>>>
>>> I'm attaching all the 4 patches here together.
>>>
>>> 1. v1-use-standard-SIGHUP-and-SIGTERM-handlers-in-autoprewarm-process.patch
>>> --  This is the patch initially sent in this mail by me, I just
>>> renamed it.
>>> 2. v1-use-MyLatch-and-standard-SIGHUP-handler-in-startup-process.patch
>>> -- This is the patch proposed by Fuji Masao, I just renamed it.
>>> 3. v1-use-standard-SIGHUP-hanlder-in-syslogger-process.patch
>>> 4. v1-use-standard-SIGTERM-handler-in-walreceiver-process.patch
>>>
>>> Please consider these patches for further review.
>>
>> Thanks for the patches! They look good to me. So barring any objections,
>> I will commit them one by one.
> 
> I pushed the following two patches.
> 
> - v1-use-standard-SIGHUP-hanlder-in-syslogger-process.patch
> - v1-use-MyLatch-and-standard-SIGHUP-handler-in-startup-process.patch

As I told in other thread [1], I'm thinking to revert this patch
because this change could have bad side-effect on the startup
process waiting for recovery conflict.

Before applying the patch, the latch that the startup process
used to wait for recovery conflict was different from the latch
that SIGHUP signal handler or walreceiver process, etc set to
wake the startup process up. So SIGHUP or walreceiver didn't
wake the startup process waiting for recovery conflict up unnecessary.

But the patch got rid of the dedicated latch for signaling
the startup process. This change forced us to use the same latch
to make the startup process wait or wake up. Which caused SIGHUP
signal handler or walreceiver proces to wake the startup process
waiting on the latch for recovery conflict up unnecessarily frequently.

While waiting for recovery conflict on buffer pin, deadlock needs
to be checked at least every deadlock_timeout. But that frequent
wakeups could prevent the deadlock timer from being triggered and
could delay that deadlock checks.

Therefore, I'm thinking to revert the commit ac22929a26 and 113d3591b8.

[1]
https://www.postgresql.org/message-id/a802f1c0-58d9-bd3f-bc3e-bdad54726855@oss.nttdata.com

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: Use standard SIGHUP and SIGTERM handlers in autoprewarm module

From

Kyotaro Horiguchi

Date:

16 December 2020, 07:24:59

At Tue, 15 Dec 2020 23:10:28 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in 
> > I pushed the following two patches.
> > - v1-use-standard-SIGHUP-hanlder-in-syslogger-process.patch
> > - v1-use-MyLatch-and-standard-SIGHUP-handler-in-startup-process.patch
> 
> As I told in other thread [1], I'm thinking to revert this patch
> because this change could have bad side-effect on the startup
> process waiting for recovery conflict.
> 
> Before applying the patch, the latch that the startup process
> used to wait for recovery conflict was different from the latch
> that SIGHUP signal handler or walreceiver process, etc set to
> wake the startup process up. So SIGHUP or walreceiver didn't
> wake the startup process waiting for recovery conflict up unnecessary.
>
> But the patch got rid of the dedicated latch for signaling
> the startup process. This change forced us to use the same latch
> to make the startup process wait or wake up. Which caused SIGHUP
> signal handler or walreceiver proces to wake the startup process
> waiting on the latch for recovery conflict up unnecessarily
> frequently.
> 
> While waiting for recovery conflict on buffer pin, deadlock needs
> to be checked at least every deadlock_timeout. But that frequent
> wakeups could prevent the deadlock timer from being triggered and
> could delay that deadlock checks.

I thought that spurious wakeups don't harm. But actually
ResolveRecoveryConflictWithBufferPin doesn't consider spurious
wakeups.  Only the timer woke up ResolveRecoveryConflictWithBufferPin
before the patch comes. Currently SIGHUP and XLogFlush() (on
walreceiver) also wake up startup process.

For a moment I thought that ResolveRecoveryConflictWithBufferPin
should wake up at shutdown time by the old recovery latch but it's not
the case since it wakes up after all blockers go away.  It seems to me
simpler to revert the patches than making the function properly handle
spurious wakeups.

> Therefore, I'm thinking to revert the commit ac22929a26 and
> 113d3591b8.
> 
> [1]
> https://www.postgresql.org/message-id/a802f1c0-58d9-bd3f-bc3e-bdad54726855@oss.nttdata.com

As the result, I agree to revert them. But I think we need to add a
comment for the reason we don't use MyLatch for recovery-wakeup after
reverting them.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Use standard SIGHUP and SIGTERM handlers in autoprewarm module

From

Fujii Masao

Date:

16 December 2020, 09:12:39


On 2020/12/16 16:24, Kyotaro Horiguchi wrote:
> At Tue, 15 Dec 2020 23:10:28 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in
>>> I pushed the following two patches.
>>> - v1-use-standard-SIGHUP-hanlder-in-syslogger-process.patch
>>> - v1-use-MyLatch-and-standard-SIGHUP-handler-in-startup-process.patch
>>
>> As I told in other thread [1], I'm thinking to revert this patch
>> because this change could have bad side-effect on the startup
>> process waiting for recovery conflict.
>>
>> Before applying the patch, the latch that the startup process
>> used to wait for recovery conflict was different from the latch
>> that SIGHUP signal handler or walreceiver process, etc set to
>> wake the startup process up. So SIGHUP or walreceiver didn't
>> wake the startup process waiting for recovery conflict up unnecessary.
>>
>> But the patch got rid of the dedicated latch for signaling
>> the startup process. This change forced us to use the same latch
>> to make the startup process wait or wake up. Which caused SIGHUP
>> signal handler or walreceiver proces to wake the startup process
>> waiting on the latch for recovery conflict up unnecessarily
>> frequently.
>>
>> While waiting for recovery conflict on buffer pin, deadlock needs
>> to be checked at least every deadlock_timeout. But that frequent
>> wakeups could prevent the deadlock timer from being triggered and
>> could delay that deadlock checks.
> 
> I thought that spurious wakeups don't harm. But actually
> ResolveRecoveryConflictWithBufferPin doesn't consider spurious
> wakeups.  Only the timer woke up ResolveRecoveryConflictWithBufferPin
> before the patch comes. Currently SIGHUP and XLogFlush() (on
> walreceiver) also wake up startup process.
> 
> For a moment I thought that ResolveRecoveryConflictWithBufferPin
> should wake up at shutdown time by the old recovery latch but it's not
> the case since it wakes up after all blockers go away.  It seems to me
> simpler to revert the patches than making the function properly handle
> spurious wakeups.
> 
>> Therefore, I'm thinking to revert the commit ac22929a26 and
>> 113d3591b8.
>>
>> [1]
>> https://www.postgresql.org/message-id/a802f1c0-58d9-bd3f-bc3e-bdad54726855@oss.nttdata.com
> 
> As the result, I agree to revert them. But I think we need to add a
> comment for the reason we don't use MyLatch for recovery-wakeup after
> reverting them.

Agreed. Attached is the patch that reverts those patches and
adds the comments about why both procLatch and recoveryWakeupLatch
are necessary.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Attachment

revert_remove_dedicated_latch_for_startup_process_v1.patch

Re: Use standard SIGHUP and SIGTERM handlers in autoprewarm module

From

Fujii Masao

Date:

17 December 2020, 09:12:39


On 2020/12/16 18:12, Fujii Masao wrote:
> 
> 
> On 2020/12/16 16:24, Kyotaro Horiguchi wrote:
>> At Tue, 15 Dec 2020 23:10:28 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in
>>>> I pushed the following two patches.
>>>> - v1-use-standard-SIGHUP-hanlder-in-syslogger-process.patch
>>>> - v1-use-MyLatch-and-standard-SIGHUP-handler-in-startup-process.patch
>>>
>>> As I told in other thread [1], I'm thinking to revert this patch
>>> because this change could have bad side-effect on the startup
>>> process waiting for recovery conflict.
>>>
>>> Before applying the patch, the latch that the startup process
>>> used to wait for recovery conflict was different from the latch
>>> that SIGHUP signal handler or walreceiver process, etc set to
>>> wake the startup process up. So SIGHUP or walreceiver didn't
>>> wake the startup process waiting for recovery conflict up unnecessary.
>>>
>>> But the patch got rid of the dedicated latch for signaling
>>> the startup process. This change forced us to use the same latch
>>> to make the startup process wait or wake up. Which caused SIGHUP
>>> signal handler or walreceiver proces to wake the startup process
>>> waiting on the latch for recovery conflict up unnecessarily
>>> frequently.
>>>
>>> While waiting for recovery conflict on buffer pin, deadlock needs
>>> to be checked at least every deadlock_timeout. But that frequent
>>> wakeups could prevent the deadlock timer from being triggered and
>>> could delay that deadlock checks.
>>
>> I thought that spurious wakeups don't harm. But actually
>> ResolveRecoveryConflictWithBufferPin doesn't consider spurious
>> wakeups.  Only the timer woke up ResolveRecoveryConflictWithBufferPin
>> before the patch comes. Currently SIGHUP and XLogFlush() (on
>> walreceiver) also wake up startup process.
>>
>> For a moment I thought that ResolveRecoveryConflictWithBufferPin
>> should wake up at shutdown time by the old recovery latch but it's not
>> the case since it wakes up after all blockers go away.  It seems to me
>> simpler to revert the patches than making the function properly handle
>> spurious wakeups.
>>
>>> Therefore, I'm thinking to revert the commit ac22929a26 and
>>> 113d3591b8.
>>>
>>> [1]
>>> https://www.postgresql.org/message-id/a802f1c0-58d9-bd3f-bc3e-bdad54726855@oss.nttdata.com
>>
>> As the result, I agree to revert them. But I think we need to add a
>> comment for the reason we don't use MyLatch for recovery-wakeup after
>> reverting them.
> 
> Agreed. Attached is the patch that reverts those patches and
> adds the comments about why both procLatch and recoveryWakeupLatch
> are necessary.

Pushed. Thanks!

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: Use standard SIGHUP and SIGTERM handlers in autoprewarm module

From

Andres Freund

Date:

29 April 2022, 19:18:15

Hi,

On 2020-12-16 18:12:39 +0900, Fujii Masao wrote:
> -    /* Wait to be signaled by UnpinBuffer() */
> +    /*
> +     * Wait to be signaled by UnpinBuffer().
> +     *
> +     * We assume that only UnpinBuffer() and the timeout requests established
> +     * above can wake us up here. WakeupRecovery() called by walreceiver or
> +     * SIGHUP signal handler, etc cannot do that because it uses the different
> +     * latch from that ProcWaitForSignal() waits on.
> +     */
>      ProcWaitForSignal(PG_WAIT_BUFFER_PIN);
>  
>      /*

Isn't this comment bogus?  The latch could e.g. be set by
procsignal_sigusr1_handler(), which the startup process uses. Or it could
already be set, when entering ResolveRecoveryConflictWithBufferPin().

Why is it even relevant that we only get woken up by UnpinBuffer()?

Greetings,

Andres Freund