Thread: Use standard SIGHUP and SIGTERM handlers in autoprewarm module
Hi,
Autoprewarm module is using it's own SIGHUP(apw_sigterm_handler, got_sigterm) and SIGTERM(apw_sighup_handler, got_sighup) handlers which are similar to standard signal handlers(except for a difference [1]). Isn't it good to remove them and use standard SignalHandlerForConfigReload and SignalHandlerForShutdownRequest?
Attaching the patch for the above changes.
Looks like the commit[2] replaced custom handlers with standard handlers.
Thoughts?
[1] apw_sigterm_handler() and apw_sighup_handler() use MyProc->procLatch
if (MyProc)
SetLatch(&MyProc->procLatch);
where as standard handlers use MyLatch
SetLatch(MyLatch);
Both MyProc->procLatch and MyLatch point to same, see comment from global.c
/*
* MyLatch points to the latch that should be used for signal handling by the
* current process. It will either point to a process local latch if the
* current process does not have a PGPROC entry in that moment, or to
* PGPROC->procLatch if it has. Thus it can always be used in signal handlers,
* without checking for its existence.
*/
[2] commit 1e53fe0e70f610c34f4c9e770d108cd94151342c
Author: Robert Haas <rhaas@postgresql.org>
Date: 2019-12-17 13:03:57 -0500
Use PostgresSigHupHandler in more places.
There seems to be no reason for every background process to have
its own flag indicating that a config-file reload is needed.
Instead, let's just use ConfigFilePending for that purpose
everywhere.
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
Autoprewarm module is using it's own SIGHUP(apw_sigterm_handler, got_sigterm) and SIGTERM(apw_sighup_handler, got_sighup) handlers which are similar to standard signal handlers(except for a difference [1]). Isn't it good to remove them and use standard SignalHandlerForConfigReload and SignalHandlerForShutdownRequest?
Attaching the patch for the above changes.
Looks like the commit[2] replaced custom handlers with standard handlers.
Thoughts?
[1] apw_sigterm_handler() and apw_sighup_handler() use MyProc->procLatch
if (MyProc)
SetLatch(&MyProc->procLatch);
where as standard handlers use MyLatch
SetLatch(MyLatch);
Both MyProc->procLatch and MyLatch point to same, see comment from global.c
/*
* MyLatch points to the latch that should be used for signal handling by the
* current process. It will either point to a process local latch if the
* current process does not have a PGPROC entry in that moment, or to
* PGPROC->procLatch if it has. Thus it can always be used in signal handlers,
* without checking for its existence.
*/
struct Latch *MyLatch;
(gdb) p MyProc->procLatch
$6 = {is_set = 0, is_shared = true, owner_pid = 1448807}
(gdb) p MyLatch
$7 = (struct Latch *) 0x7fcacc6d902c
(gdb) p &MyProc->procLatch
$8 = (Latch *) 0x7fcacc6d902c
(gdb) p *MyLatch
$9 = {is_set = 0, is_shared = true, owner_pid = 1448807}
$6 = {is_set = 0, is_shared = true, owner_pid = 1448807}
(gdb) p MyLatch
$7 = (struct Latch *) 0x7fcacc6d902c
(gdb) p &MyProc->procLatch
$8 = (Latch *) 0x7fcacc6d902c
(gdb) p *MyLatch
$9 = {is_set = 0, is_shared = true, owner_pid = 1448807}
[2] commit 1e53fe0e70f610c34f4c9e770d108cd94151342c
Author: Robert Haas <rhaas@postgresql.org>
Date: 2019-12-17 13:03:57 -0500
Use PostgresSigHupHandler in more places.
There seems to be no reason for every background process to have
its own flag indicating that a config-file reload is needed.
Instead, let's just use ConfigFilePending for that purpose
everywhere.
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
Attachment
On 2020/10/05 19:45, Bharath Rupireddy wrote: > Hi, > > Autoprewarm module is using it's own SIGHUP(apw_sigterm_handler, got_sigterm) and SIGTERM(apw_sighup_handler, got_sighup)handlers which are similar to standard signal handlers(except for a difference [1]). Isn't it good to remove themand use standard SignalHandlerForConfigReload and SignalHandlerForShutdownRequest? > > Attaching the patch for the above changes. > > Looks like the commit[2] replaced custom handlers with standard handlers. > > Thoughts? +1 The patch looks good to me. ISTM that we can also replace StartupProcSigHupHandler() in startup.c with SignalHandlerForConfigReload() by making the startup process use the general shared latch instead of its own one. POC patch attached. Thought? Probably we can also replace sigHupHandler() in syslogger.c with SignalHandlerForConfigReload()? This would be separate patch, though. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Attachment
On Mon, Oct 5, 2020 at 8:04 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > On 2020/10/05 19:45, Bharath Rupireddy wrote: > > Hi, > > > > Autoprewarm module is using it's own SIGHUP(apw_sigterm_handler, got_sigterm) and SIGTERM(apw_sighup_handler, got_sighup)handlers which are similar to standard signal handlers(except for a difference [1]). Isn't it good to remove themand use standard SignalHandlerForConfigReload and SignalHandlerForShutdownRequest? > > > > Attaching the patch for the above changes. > > > > Looks like the commit[2] replaced custom handlers with standard handlers. > > > > Thoughts? > > +1 > > The patch looks good to me. > Thanks. > > ISTM that we can also replace StartupProcSigHupHandler() in startup.c > with SignalHandlerForConfigReload() by making the startup process use > the general shared latch instead of its own one. POC patch attached. > Thought? > I'm not quite sure whether it breaks something or not. I see that WakeupRecovery() with XLogCtl->recoveryWakeupLatch latch from the startup process is also being used in the walreceiver process. I may be wrong, but have some concern if the behaviour is different in case of EXEC_BACKEND and Windows? Another concern is that we are always using XLogCtl->recoveryWakeupLatch in shared mode, this makes sense as this latch is also being used in walrecevier. But sometimes, MyLatch is created in non-shared mode as well(see InitLatch(MyLatch)). Others may have better thoughts though. > > Probably we can also replace sigHupHandler() in syslogger.c with > SignalHandlerForConfigReload()? This would be separate patch, though. > +1 to replace sigHupHandler() with SignalHandlerForConfigReload() as the latch and the functionality are pretty much the same. WalReceiverMai(): I think we can also replace WalRcvShutdownHandler() with SignalHandlerForShutdownRequest() because walrcv->latch point to &MyProc->procLatch which in turn point to MyLatch. Thoughts? If okay, we can combine these into a single patch. I will post an updated patch soon. With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
On 2020/10/06 1:18, Bharath Rupireddy wrote: > On Mon, Oct 5, 2020 at 8:04 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: >> >> On 2020/10/05 19:45, Bharath Rupireddy wrote: >>> Hi, >>> >>> Autoprewarm module is using it's own SIGHUP(apw_sigterm_handler, got_sigterm) and SIGTERM(apw_sighup_handler, got_sighup)handlers which are similar to standard signal handlers(except for a difference [1]). Isn't it good to remove themand use standard SignalHandlerForConfigReload and SignalHandlerForShutdownRequest? >>> >>> Attaching the patch for the above changes. >>> >>> Looks like the commit[2] replaced custom handlers with standard handlers. >>> >>> Thoughts? >> >> +1 >> >> The patch looks good to me. >> > > Thanks. > >> >> ISTM that we can also replace StartupProcSigHupHandler() in startup.c >> with SignalHandlerForConfigReload() by making the startup process use >> the general shared latch instead of its own one. POC patch attached. >> Thought? >> > > I'm not quite sure whether it breaks something or not. I see that > WakeupRecovery() with XLogCtl->recoveryWakeupLatch latch from the > startup process is also being used in the walreceiver process. I may > be wrong, but have some concern if the behaviour is different in case > of EXEC_BACKEND and Windows? Unless I'm wrong, regarding MyLatch, the behavior is not different whether in EXEC_BACKEND or not. In both cases, SwitchToSharedLatch() is called and MyLatch is set to the shared latch, i.e., MyProc->procLatch. > > Another concern is that we are always using > XLogCtl->recoveryWakeupLatch in shared mode, this makes sense as this > latch is also being used in walrecevier. But sometimes, MyLatch is > created in non-shared mode as well(see InitLatch(MyLatch)). Yes, and then the startup process calls SwitchToSharedLatch() and repoint MyLatch to the shared one. > > Others may have better thoughts though. Okay, I will reconsider the patch and post it separately later if necessary. > >> >> Probably we can also replace sigHupHandler() in syslogger.c with >> SignalHandlerForConfigReload()? This would be separate patch, though. >> > > +1 to replace sigHupHandler() with SignalHandlerForConfigReload() as > the latch and the functionality are pretty much the same. > > WalReceiverMai(): I think we can also replace WalRcvShutdownHandler() > with SignalHandlerForShutdownRequest() because walrcv->latch point to > &MyProc->procLatch which in turn point to MyLatch. > > Thoughts? If okay, we can combine these into a single patch. I will > post an updated patch soon. +1 Or it's also ok to make each patch separately. Anyway, thanks! Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On Mon, Oct 05, 2020 at 11:34:05PM +0900, Fujii Masao wrote: > ISTM that we can also replace StartupProcSigHupHandler() in startup.c > with SignalHandlerForConfigReload() by making the startup process use > the general shared latch instead of its own one. POC patch attached. > Thought? That looks good to me. Nice cleanup. > Probably we can also replace sigHupHandler() in syslogger.c with > SignalHandlerForConfigReload()? This would be separate patch, though. +1. -- Michael
Attachment
On Tue, Oct 6, 2020 at 11:20 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > >> > >> Probably we can also replace sigHupHandler() in syslogger.c with > >> SignalHandlerForConfigReload()? This would be separate patch, though. > >> > > > > +1 to replace sigHupHandler() with SignalHandlerForConfigReload() as > > the latch and the functionality are pretty much the same. > > > > WalReceiverMai(): I think we can also replace WalRcvShutdownHandler() > > with SignalHandlerForShutdownRequest() because walrcv->latch point to > > &MyProc->procLatch which in turn point to MyLatch. > > > > Thoughts? If okay, we can combine these into a single patch. I will > > post an updated patch soon. > > +1 Or it's also ok to make each patch separately. > Anyway, thanks! > Thanks. +1 to have separate patches. I will post two separate patches for autoprewarm and WalRcvShutdownHandler for further review. The other two patches can be for startup.c and syslogger.c. With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
On Tue, Oct 6, 2020 at 11:41 AM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > On Tue, Oct 6, 2020 at 11:20 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > > > +1 Or it's also ok to make each patch separately. > > Anyway, thanks! > > > > Thanks. +1 to have separate patches. I will post two separate patches > for autoprewarm and WalRcvShutdownHandler for further review. The > other two patches can be for startup.c and syslogger.c. > I'm attaching all the 4 patches here together. 1. v1-use-standard-SIGHUP-and-SIGTERM-handlers-in-autoprewarm-process.patch -- This is the patch initially sent in this mail by me, I just renamed it. 2. v1-use-MyLatch-and-standard-SIGHUP-handler-in-startup-process.patch -- This is the patch proposed by Fuji Masao, I just renamed it. 3. v1-use-standard-SIGHUP-hanlder-in-syslogger-process.patch 4. v1-use-standard-SIGTERM-handler-in-walreceiver-process.patch Please consider these patches for further review. With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
Attachment
On Wed, Oct 7, 2020 at 8:00 AM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > On Tue, Oct 6, 2020 at 11:41 AM Bharath Rupireddy > <bharath.rupireddyforpostgres@gmail.com> wrote: > > > > On Tue, Oct 6, 2020 at 11:20 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > > > > > +1 Or it's also ok to make each patch separately. > > > Anyway, thanks! > > > > > > > Thanks. +1 to have separate patches. I will post two separate patches > > for autoprewarm and WalRcvShutdownHandler for further review. The > > other two patches can be for startup.c and syslogger.c. > > > > I'm attaching all the 4 patches here together. > > 1. v1-use-standard-SIGHUP-and-SIGTERM-handlers-in-autoprewarm-process.patch > -- This is the patch initially sent in this mail by me, I just > renamed it. > 2. v1-use-MyLatch-and-standard-SIGHUP-handler-in-startup-process.patch > -- This is the patch proposed by Fuji Masao, I just renamed it. > 3. v1-use-standard-SIGHUP-hanlder-in-syslogger-process.patch > 4. v1-use-standard-SIGTERM-handler-in-walreceiver-process.patch > > Please consider these patches for further review. > Added this to the commitfest for further review. https://commitfest.postgresql.org/30/2756/ With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
On Wed, Oct 7, 2020 at 8:39 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote:
On Wed, Oct 7, 2020 at 8:00 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Tue, Oct 6, 2020 at 11:41 AM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > On Tue, Oct 6, 2020 at 11:20 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> > >
> > > +1 Or it's also ok to make each patch separately.
> > > Anyway, thanks!
> > >
> >
> > Thanks. +1 to have separate patches. I will post two separate patches
> > for autoprewarm and WalRcvShutdownHandler for further review. The
> > other two patches can be for startup.c and syslogger.c.
[ CC'd Robert Haas since he's the main name in interrupt.c, test_shm_mq/worker.c, ]
src/test/modules/test_shm_mq/worker.c appears to do the right thing the wrong way - it has its own custom handler instead of using die() or SignalHandlerForShutdownRequest().
In contrast src/test/modules/worker_spi/worker_spi.c looks plain wrong. Especially since it's quoted as an example of how to do things right. It won't respond to SIGTERM at all while it's executing a query from its queue, no matter how long that query takes or whether it blocks. It can inhibit even postmaster shutdown as a result.
I was going to lob off a quick patch to fix this by making both use quickdie() for SIGQUIT and die() for SIGTERM, but after reading for a bit I'm no longer sure what the right choice even is. I'd welcome some opinions.
The problem is that but interrupt.c and interrupt.h actually define and recommend different and simpler handlers for these jobs - ones that don't actually work properly for code that calls into Pg core and might rely on CHECK_FOR_INTERRUPTS(), InterruptsPending and ProcDiePending to properly respect SIGTERM.
And to add to the confusion the bgworker infra adds its own different default SIGTERM handler bgworker_die() that's weirdly in-between the interrupt.c and postgres.c signal handling.
So I'm no longer sure how the example code should even be fixed. I'm not convinced everyone using die() and quickdie() is good given they currently seem to be assumed to be mainly for user backends. Maybe wwe should move them to interrupt.c along with CHECK_FOR_INTERRUPTS(), ProcessInterrupts, etc and document them as for all database-connected or shmem-connected backends to use.
So in the medium term, interrupt.c's SignalHandlerForShutdownRequest() and SignalHandlerForCrashExit() should be combined with die() and quickdie(), integrating properly with CHECK_FOR_INTERRUPTS(), ProcessInterrupts(), etc. We can add a hook we call before we proc_exit() in response to ProcDiePending so backends can choose to mask it if there's a period during which they wish to defer responding to SIGTERM, but by default everything will respect SIGTERM -> die() sets ProcDiePending -> CHECK_FOR_INTERRUPTS() -> ProcessInterrupts() -> proc_exit() . interrupt.c's SignalHandlerForCrashExit() and SignalHandlerForShutdownRequest() become deprecated/legacy. We add a separate temporary handler that's installed by init.c for early SIGQUIT handling but document it as to be replaced after backends start properly. We'd delete the bgw-specific signal handlers and install die() and procdie() instead during StartBackgroundWorker - at least if the bgw is connecting to shmem or a database. interrupt.c's HandleMainLoopInterrupts() could be static inlined, and adopted in the bgworker examples and all the other places that currently do ConfigReloadPending / ProcessConfigFile() etc themselves.
It wouldn't be a clean sweep of consistent signal handling, given all the funky stuff we have in the checkpointer, walsender, etc. But I think it might help...
(And maybe I could even combine the various am_foo and is_bar globals we use to identify different sorts of backend, while doing such cleanups).
On 2020/10/07 11:30, Bharath Rupireddy wrote: > On Tue, Oct 6, 2020 at 11:41 AM Bharath Rupireddy > <bharath.rupireddyforpostgres@gmail.com> wrote: >> >> On Tue, Oct 6, 2020 at 11:20 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: >>> >>> +1 Or it's also ok to make each patch separately. >>> Anyway, thanks! >>> >> >> Thanks. +1 to have separate patches. I will post two separate patches >> for autoprewarm and WalRcvShutdownHandler for further review. The >> other two patches can be for startup.c and syslogger.c. >> > > I'm attaching all the 4 patches here together. > > 1. v1-use-standard-SIGHUP-and-SIGTERM-handlers-in-autoprewarm-process.patch > -- This is the patch initially sent in this mail by me, I just > renamed it. > 2. v1-use-MyLatch-and-standard-SIGHUP-handler-in-startup-process.patch > -- This is the patch proposed by Fuji Masao, I just renamed it. > 3. v1-use-standard-SIGHUP-hanlder-in-syslogger-process.patch > 4. v1-use-standard-SIGTERM-handler-in-walreceiver-process.patch > > Please consider these patches for further review. Thanks for the patches! They look good to me. So barring any objections, I will commit them one by one. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On 2020/10/29 15:21, Fujii Masao wrote: > > > On 2020/10/07 11:30, Bharath Rupireddy wrote: >> On Tue, Oct 6, 2020 at 11:41 AM Bharath Rupireddy >> <bharath.rupireddyforpostgres@gmail.com> wrote: >>> >>> On Tue, Oct 6, 2020 at 11:20 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: >>>> >>>> +1 Or it's also ok to make each patch separately. >>>> Anyway, thanks! >>>> >>> >>> Thanks. +1 to have separate patches. I will post two separate patches >>> for autoprewarm and WalRcvShutdownHandler for further review. The >>> other two patches can be for startup.c and syslogger.c. >>> >> >> I'm attaching all the 4 patches here together. >> >> 1. v1-use-standard-SIGHUP-and-SIGTERM-handlers-in-autoprewarm-process.patch >> -- This is the patch initially sent in this mail by me, I just >> renamed it. >> 2. v1-use-MyLatch-and-standard-SIGHUP-handler-in-startup-process.patch >> -- This is the patch proposed by Fuji Masao, I just renamed it. >> 3. v1-use-standard-SIGHUP-hanlder-in-syslogger-process.patch >> 4. v1-use-standard-SIGTERM-handler-in-walreceiver-process.patch >> >> Please consider these patches for further review. > > Thanks for the patches! They look good to me. So barring any objections, > I will commit them one by one. I pushed the following two patches. - v1-use-standard-SIGHUP-hanlder-in-syslogger-process.patch - v1-use-MyLatch-and-standard-SIGHUP-handler-in-startup-process.patch Regarding other two patches, I think that it's better to use MyLatch rather than MyProc->procLatch or walrcv->latch in WaitLatch() and ResetLatch(), like other code does. Attached are the updated versions of the patches. Thought? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Attachment
On Wed, Nov 4, 2020 at 2:36 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > Regarding other two patches, I think that it's better to use MyLatch > rather than MyProc->procLatch or walrcv->latch in WaitLatch() and > ResetLatch(), like other code does. Attached are the updated versions > of the patches. Thought? > +1 for replacing MyProc->procLatch with MyLatch in the autoprewarm module, and the patch looks good to me. I'm not quite sure to replace all the places in the walreceiver process, for instance in WalRcvForceReply() we are using spinlock to acquire the latch pointer and . Others may have better thoughts on this. With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
At Wed, 4 Nov 2020 21:16:29 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in > On Wed, Nov 4, 2020 at 2:36 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > > > Regarding other two patches, I think that it's better to use MyLatch > > rather than MyProc->procLatch or walrcv->latch in WaitLatch() and > > ResetLatch(), like other code does. Attached are the updated versions > > of the patches. Thought? > > > > +1 for replacing MyProc->procLatch with MyLatch in the autoprewarm > module, and the patch looks good to me. Looks good to me, too. > I'm not quite sure to replace all the places in the walreceiver > process, for instance in WalRcvForceReply() we are using spinlock to > acquire the latch pointer and . Others may have better thoughts on > this. The SIGTERM part looks good. The only difference between WalRcvSigHupHandler and SignalHandlerForConfigReload is whether latch is set or not. I don't think it's a problem that config-reload causes an extra wakeup. Couldn't we do the same thing for SIGHUP? We might even be able to reload config file in ProcessWalRcvInterrupts(). regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On 2020/11/05 12:12, Kyotaro Horiguchi wrote: > At Wed, 4 Nov 2020 21:16:29 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in >> On Wed, Nov 4, 2020 at 2:36 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: >>> >>> Regarding other two patches, I think that it's better to use MyLatch >>> rather than MyProc->procLatch or walrcv->latch in WaitLatch() and >>> ResetLatch(), like other code does. Attached are the updated versions >>> of the patches. Thought? >>> >> >> +1 for replacing MyProc->procLatch with MyLatch in the autoprewarm >> module, and the patch looks good to me. > > Looks good to me, too. Thanks for the review! I pushed the patch. > >> I'm not quite sure to replace all the places in the walreceiver >> process, for instance in WalRcvForceReply() we are using spinlock to >> acquire the latch pointer and . Others may have better thoughts on >> this. > > The SIGTERM part looks good. The only difference between > WalRcvSigHupHandler and SignalHandlerForConfigReload is whether latch > is set or not. I don't think it's a problem that config-reload causes > an extra wakeup. Couldn't we do the same thing for SIGHUP? I agree that we can use even standard SIGHUP handler in walreceiver. I'm not sure why SetLatch() was not called in walreceiver's SIGHUP handler so far. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On Fri, Nov 6, 2020 at 11:00 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > >> I'm not quite sure to replace all the places in the walreceiver > >> process, for instance in WalRcvForceReply() we are using spinlock to > >> acquire the latch pointer and . Others may have better thoughts on > >> this. > > > > The SIGTERM part looks good. The only difference between > > WalRcvSigHupHandler and SignalHandlerForConfigReload is whether latch > > is set or not. I don't think it's a problem that config-reload causes > > an extra wakeup. Couldn't we do the same thing for SIGHUP? > > I agree that we can use even standard SIGHUP handler in walreceiver. > I'm not sure why SetLatch() was not called in walreceiver's SIGHUP > handler so far. > The main reason for having SetLatch() in SignalHandlerForConfigReload() is to wake up the calling process if waiting in WaitLatchOrSocket() or WaitLatch() and reload the new config file and use the reloaded config variables. Maybe we should give a thought on the scenarios in which the walreceiver process waits, and what happens in case the latch is set when SIGHUP is received. And also, I think it's worth having a look at the commit 40f908bdcdc7 that introduced WalRcvSigHupHandler() and 597a87ccc that replaced custom latch with procLatch. With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
At Sat, 7 Nov 2020 19:31:21 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in > The main reason for having SetLatch() in > SignalHandlerForConfigReload() is to wake up the calling process if > waiting in WaitLatchOrSocket() or WaitLatch() and reload the new > config file and use the reloaded config variables. Maybe we should > give a thought on the scenarios in which the walreceiver process > waits, and what happens in case the latch is set when SIGHUP is > received. The difference is whether the config file is processed at the next wakeup (by force-reply-request or SIGTERM) of walreceiver or immediately. If server-reload happened frequently, say, several times per second(?), we should consider to reduce the useless reloading, but actually that's not the case. > And also, I think it's worth having a look at the commit 40f908bdcdc7 > that introduced WalRcvSigHupHandler() and 597a87ccc that replaced > custom latch with procLatch. At the time of the first patch, PostgreSQL processes used arbitrarily implemented SIGHUP handlers for their own so it is natural that walreceiver used its own one. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On 2020/11/10 16:17, Kyotaro Horiguchi wrote: > At Sat, 7 Nov 2020 19:31:21 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in >> The main reason for having SetLatch() in >> SignalHandlerForConfigReload() is to wake up the calling process if >> waiting in WaitLatchOrSocket() or WaitLatch() and reload the new >> config file and use the reloaded config variables. Maybe we should >> give a thought on the scenarios in which the walreceiver process >> waits, and what happens in case the latch is set when SIGHUP is >> received. > > The difference is whether the config file is processed at the next > wakeup (by force-reply-request or SIGTERM) of walreceiver or > immediately. If server-reload happened frequently, say, several times > per second(?), we should consider to reduce the useless reloading, but > actually that's not the case. So, attached is the patch that makes walreceiver use both standard SIGTERM and SIGHUP handlers. Currently I've not found any actual issues by making walreceiver use standard SIGHUP handler, yet. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Attachment
On Tue, Nov 10, 2020 at 3:04 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > >> The main reason for having SetLatch() in > >> SignalHandlerForConfigReload() is to wake up the calling process if > >> waiting in WaitLatchOrSocket() or WaitLatch() and reload the new > >> config file and use the reloaded config variables. Maybe we should > >> give a thought on the scenarios in which the walreceiver process > >> waits, and what happens in case the latch is set when SIGHUP is > >> received. > > > > The difference is whether the config file is processed at the next > > wakeup (by force-reply-request or SIGTERM) of walreceiver or > > immediately. If server-reload happened frequently, say, several times > > per second(?), we should consider to reduce the useless reloading, but > > actually that's not the case. > > So, attached is the patch that makes walreceiver use both standard > SIGTERM and SIGHUP handlers. Currently I've not found any actual > issues by making walreceiver use standard SIGHUP handler, yet. > I think it makes sense to replace WalRcv->latch with MyProc->procLatch(as they point to the same latch) in the functions that are called in the walreceiver process. However, we are using walrcv->latch with spinlock, say in WalRcvForceReply() and RequestXLogStreaming() both are called from the startup process. See commit 45f9d08684, that made the access to the walreceiver's latch(WalRcv->latch) by the other process(startup) spinlock protected And looks like, in general it's a standard practice to set latch to wake up the process if waiting in case a SIGHUP signal is received and reload the relevant config variables. Going by the above analysis, the v3 patch looks good to me. With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
Hi, Attaching a patch that replaces custom signal handlers for SIGHUP and SIGTERM in worker_spi.c. Thoughts? With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
Attachment
On 2020/11/10 21:30, Bharath Rupireddy wrote: > On Tue, Nov 10, 2020 at 3:04 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: >> >>>> The main reason for having SetLatch() in >>>> SignalHandlerForConfigReload() is to wake up the calling process if >>>> waiting in WaitLatchOrSocket() or WaitLatch() and reload the new >>>> config file and use the reloaded config variables. Maybe we should >>>> give a thought on the scenarios in which the walreceiver process >>>> waits, and what happens in case the latch is set when SIGHUP is >>>> received. >>> >>> The difference is whether the config file is processed at the next >>> wakeup (by force-reply-request or SIGTERM) of walreceiver or >>> immediately. If server-reload happened frequently, say, several times >>> per second(?), we should consider to reduce the useless reloading, but >>> actually that's not the case. >> >> So, attached is the patch that makes walreceiver use both standard >> SIGTERM and SIGHUP handlers. Currently I've not found any actual >> issues by making walreceiver use standard SIGHUP handler, yet. >> > > I think it makes sense to replace WalRcv->latch with > MyProc->procLatch(as they point to the same latch) in the functions > that are called in the walreceiver process. However, we are using > walrcv->latch with spinlock, say in WalRcvForceReply() and > RequestXLogStreaming() both are called from the startup process. See > commit 45f9d08684, that made the access to the walreceiver's > latch(WalRcv->latch) by the other process(startup) spinlock protected > > And looks like, in general it's a standard practice to set latch to > wake up the process if waiting in case a SIGHUP signal is received and > reload the relevant config variables. > > Going by the above analysis, the v3 patch looks good to me. Thanks for the analysis! I pushed the patch. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On Thu, Nov 12, 2020 at 10:06 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > Thanks for the analysis! I pushed the patch. > Thanks! Since we are replacing custom SIGHUP and SIGTERM handlers with standard ones, how about doing the same thing in worker_spi.c? I posted a patch previously [1] in this mail thread. If it makes sense, please review it. [1] - https://www.postgresql.org/message-id/CALj2ACXDEZhAFOTDcqO9cFSRvrFLyYOnPKrcA1UG4uZn9hUAVg%40mail.gmail.com With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
On 2020/11/13 20:24, Bharath Rupireddy wrote: > On Thu, Nov 12, 2020 at 10:06 AM Fujii Masao > <masao.fujii@oss.nttdata.com> wrote: >> >> Thanks for the analysis! I pushed the patch. >> > > Thanks! Since we are replacing custom SIGHUP and SIGTERM handlers with > standard ones, how about doing the same thing in worker_spi.c? I > posted a patch previously [1] in this mail thread. If it makes sense, > please review it. I agree to simplify the worker_spi code by making it use the standard signal handlers. But as far as I read Craig Ringer's comments upthread about worker_spi, it's not enough to just replace the dedicated SIGTERM handler with the standard one. ISTM that probably worker_spi should use the signal handler handling InterruptPending and ProcDiePending like die() does. What do you think about Craig Ringer's comments? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Thanks Craig! On Fri, Oct 23, 2020 at 9:37 AM Craig Ringer <craig.ringer@enterprisedb.com> wrote: > > src/test/modules/test_shm_mq/worker.c appears to do the right thing the wrong way - it has its own custom handler insteadof using die() or SignalHandlerForShutdownRequest(). > handle_sigterm() and die() are similar except that die() has extra handling(below) for exiting immediately when waiting for input/command from the client. /* * If we're in single user mode, we want to quit immediately - we can't * rely on latches as they wouldn't work when stdin/stdout is a file. * Rather ugly, but it's unlikely to be worthwhile to invest much more * effort just for the benefit of single user mode. */ if (DoingCommandRead && whereToSendOutput != DestRemote) ProcessInterrupts(); Having this extra handling is correct for normal backends as they can connect directly to clients for reading input commands, the above if condition may become true and ProcessInterrupts() may be called to handle the SIGTERM immediately. Note that DoingCommandRead can never be true in bg workers as it's being set to true only in normal backend PostgresMain(). If we use die()(instead of custom SIGTERM handlers such as handle_sigterm()) in a bg worker/non-backend process, there are no chances that the above part of the code gets hit and the interrupts are processed immediately. And also here are the bg worker process that use die() instead of their own custom handlers: parallel workers, autoprewarm worker, autovacuum worker, logical replication launcher and apply worker, wal sender process I think we can also use die() instead of handle_sigterm() in test_shm_mq/worker.c. I attached a patch for this change. > > In contrast src/test/modules/worker_spi/worker_spi.c looks plain wrong. Especially since it's quoted as an example ofhow to do things right. It won't respond to SIGTERM at all while it's executing a query from its queue, no matter how longthat query takes or whether it blocks. It can inhibit even postmaster shutdown as a result. > Postmaster sends SIGTERM to all children(backends and bgworkers) for both smart shutdown(wait for children to end their work, then shut down.) and fast shutdown(rollback active transactions and shutdown when they are gone.) For immediate shutdown SIGQUIT is sent to children.(see pmdie()). For each bg worker(so is for worker_spi.c), SignalHandlerForCrashExit() is set as SIGQUIT handler in InitPostmasterChild(), which does nothing but exits immediately. We can not use quickdie() here because as a bg worker we don't have to/can not send anything to client. We are good with SignalHandlerForCrashExit() as with all other bg workers. Yes, if having worker_spi_sigterm/SignalHandlerForShutdownRequest, sometimes(as explained above) the worker_spi worker may not respond to SIGTERM. I think we should be having die() as SIGTERM handler here (as with normal backend and parallel workers). Attaching another patch that has replaced custom SIGTERM handler worker_spi_sigterm() with die() and custom SIGHUP handler worker_spi_sighup() with standard SignalHandlerForConfigReload(). > > I was going to lob off a quick patch to fix this by making both use quickdie() for SIGQUIT and die() for SIGTERM, but afterreading for a bit I'm no longer sure what the right choice even is. I'd welcome some opinions. > We can not use quickdie() here because as a bg worker we don't have to/can not send anything to client. We are good with SignalHandlerForCrashExit() as with all other bg workers. Thoughts? With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
Attachment
On 2020/11/17 21:18, Bharath Rupireddy wrote: > Thanks Craig! > > On Fri, Oct 23, 2020 at 9:37 AM Craig Ringer > <craig.ringer@enterprisedb.com> wrote: >> >> src/test/modules/test_shm_mq/worker.c appears to do the right thing the wrong way - it has its own custom handler insteadof using die() or SignalHandlerForShutdownRequest(). >> > > handle_sigterm() and die() are similar except that die() has extra > handling(below) for exiting immediately when waiting for input/command > from the client. > /* > * If we're in single user mode, we want to quit immediately - we can't > * rely on latches as they wouldn't work when stdin/stdout is a file. > * Rather ugly, but it's unlikely to be worthwhile to invest much more > * effort just for the benefit of single user mode. > */ > if (DoingCommandRead && whereToSendOutput != DestRemote) > ProcessInterrupts(); > > Having this extra handling is correct for normal backends as they can > connect directly to clients for reading input commands, the above if > condition may become true and ProcessInterrupts() may be called to > handle the SIGTERM immediately. > > Note that DoingCommandRead can never be true in bg workers as it's > being set to true only in normal backend PostgresMain(). If we use > die()(instead of custom SIGTERM handlers such as handle_sigterm()) in > a bg worker/non-backend process, there are no chances that the above > part of the code gets hit and the interrupts are processed > immediately. And also here are the bg worker process that use die() > instead of their own custom handlers: parallel workers, autoprewarm > worker, autovacuum worker, logical replication launcher and apply > worker, wal sender process > > I think we can also use die() instead of handle_sigterm() in > test_shm_mq/worker.c. > > I attached a patch for this change. Thanks for the patch! It looks good to me. BTW, I tried to find the past discussion about why die() was not used, but I could not yet. > >> >> In contrast src/test/modules/worker_spi/worker_spi.c looks plain wrong. Especially since it's quoted as an example ofhow to do things right. It won't respond to SIGTERM at all while it's executing a query from its queue, no matter how longthat query takes or whether it blocks. It can inhibit even postmaster shutdown as a result. >> > > Postmaster sends SIGTERM to all children(backends and bgworkers) for > both smart shutdown(wait for children to end their work, then shut > down.) and fast shutdown(rollback active transactions and shutdown > when they are gone.) For immediate shutdown SIGQUIT is sent to > children.(see pmdie()). > > For each bg worker(so is for worker_spi.c), > SignalHandlerForCrashExit() is set as SIGQUIT handler in > InitPostmasterChild(), which does nothing but exits immediately. We > can not use quickdie() here because as a bg worker we don't have > to/can not send anything to client. We are good with > SignalHandlerForCrashExit() as with all other bg workers. > > Yes, if having worker_spi_sigterm/SignalHandlerForShutdownRequest, > sometimes(as explained above) the worker_spi worker may not respond to > SIGTERM. I think we should be having die() as SIGTERM handler here (as > with normal backend and parallel workers). > > Attaching another patch that has replaced custom SIGTERM handler > worker_spi_sigterm() with die() and custom SIGHUP handler > worker_spi_sighup() with standard SignalHandlerForConfigReload(). This patch also looks good to me. > >> >> I was going to lob off a quick patch to fix this by making both use quickdie() for SIGQUIT and die() for SIGTERM, butafter reading for a bit I'm no longer sure what the right choice even is. I'd welcome some opinions. >> > > We can not use quickdie() here because as a bg worker we don't have > to/can not send anything to client. We are good with > SignalHandlerForCrashExit() as with all other bg workers. > > Thoughts? I think you're right. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On Wed, Nov 18, 2020 at 5:52 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>
> > handle_sigterm() and die() are similar except that die() has extra
> > handling(below) for exiting immediately when waiting for input/command
> > from the client.
> > /*
> > * If we're in single user mode, we want to quit immediately - we can't
> > * rely on latches as they wouldn't work when stdin/stdout is a file.
> > * Rather ugly, but it's unlikely to be worthwhile to invest much more
> > * effort just for the benefit of single user mode.
> > */
> > if (DoingCommandRead && whereToSendOutput != DestRemote)
> > ProcessInterrupts();
> >
> > Having this extra handling is correct for normal backends as they can
> > connect directly to clients for reading input commands, the above if
> > condition may become true and ProcessInterrupts() may be called to
> > handle the SIGTERM immediately.
> >
> > Note that DoingCommandRead can never be true in bg workers as it's
> > being set to true only in normal backend PostgresMain(). If we use
> > die()(instead of custom SIGTERM handlers such as handle_sigterm()) in
> > a bg worker/non-backend process, there are no chances that the above
> > part of the code gets hit and the interrupts are processed
> > immediately. And also here are the bg worker process that use die()
> > instead of their own custom handlers: parallel workers, autoprewarm
> > worker, autovacuum worker, logical replication launcher and apply
> > worker, wal sender process
> >
> > I think we can also use die() instead of handle_sigterm() in
> > test_shm_mq/worker.c.
> >
> > I attached a patch for this change.
>
> Thanks for the patch! It looks good to me.
>
Thanks.
>
> BTW, I tried to find the past discussion about why die() was not used,
> but I could not yet.
>
I think the commit 2505ce0 that altered the comment has something: handle_sigterm() is stripped down to remove if (DoingCommandRead && whereToSendOutput != DestRemote) part from die() since test_shm_mq bg worker or for that matter any bg worker do not have an equivalent of the regular backend's command-read loop.
--- a/src/test/modules/test_shm_mq/worker.c
+++ b/src/test/modules/test_shm_mq/worker.c
@@ -60,13 +60,9 @@ test_shm_mq_main(Datum main_arg)
*
* We want CHECK_FOR_INTERRUPTS() to kill off this worker process just as
* it would a normal user backend. To make that happen, we establish a
- * signal handler that is a stripped-down version of die(). We don't have
- * any equivalent of the backend's command-read loop, where interrupts can
- * be processed immediately, so make sure ImmediateInterruptOK is turned
- * off.
+ * signal handler that is a stripped-down version of die().
*/
pqsignal(SIGTERM, handle_sigterm);
- ImmediateInterruptOK = false;
>
> > Attaching another patch that has replaced custom SIGTERM handler
> > worker_spi_sigterm() with die() and custom SIGHUP handler
> > worker_spi_sighup() with standard SignalHandlerForConfigReload().
>
> This patch also looks good to me.
>
Thanks.
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
>
> > handle_sigterm() and die() are similar except that die() has extra
> > handling(below) for exiting immediately when waiting for input/command
> > from the client.
> > /*
> > * If we're in single user mode, we want to quit immediately - we can't
> > * rely on latches as they wouldn't work when stdin/stdout is a file.
> > * Rather ugly, but it's unlikely to be worthwhile to invest much more
> > * effort just for the benefit of single user mode.
> > */
> > if (DoingCommandRead && whereToSendOutput != DestRemote)
> > ProcessInterrupts();
> >
> > Having this extra handling is correct for normal backends as they can
> > connect directly to clients for reading input commands, the above if
> > condition may become true and ProcessInterrupts() may be called to
> > handle the SIGTERM immediately.
> >
> > Note that DoingCommandRead can never be true in bg workers as it's
> > being set to true only in normal backend PostgresMain(). If we use
> > die()(instead of custom SIGTERM handlers such as handle_sigterm()) in
> > a bg worker/non-backend process, there are no chances that the above
> > part of the code gets hit and the interrupts are processed
> > immediately. And also here are the bg worker process that use die()
> > instead of their own custom handlers: parallel workers, autoprewarm
> > worker, autovacuum worker, logical replication launcher and apply
> > worker, wal sender process
> >
> > I think we can also use die() instead of handle_sigterm() in
> > test_shm_mq/worker.c.
> >
> > I attached a patch for this change.
>
> Thanks for the patch! It looks good to me.
>
Thanks.
>
> BTW, I tried to find the past discussion about why die() was not used,
> but I could not yet.
>
I think the commit 2505ce0 that altered the comment has something: handle_sigterm() is stripped down to remove if (DoingCommandRead && whereToSendOutput != DestRemote) part from die() since test_shm_mq bg worker or for that matter any bg worker do not have an equivalent of the regular backend's command-read loop.
--- a/src/test/modules/test_shm_mq/worker.c
+++ b/src/test/modules/test_shm_mq/worker.c
@@ -60,13 +60,9 @@ test_shm_mq_main(Datum main_arg)
*
* We want CHECK_FOR_INTERRUPTS() to kill off this worker process just as
* it would a normal user backend. To make that happen, we establish a
- * signal handler that is a stripped-down version of die(). We don't have
- * any equivalent of the backend's command-read loop, where interrupts can
- * be processed immediately, so make sure ImmediateInterruptOK is turned
- * off.
+ * signal handler that is a stripped-down version of die().
*/
pqsignal(SIGTERM, handle_sigterm);
- ImmediateInterruptOK = false;
BackgroundWorkerUnblockSignals();
> > Attaching another patch that has replaced custom SIGTERM handler
> > worker_spi_sigterm() with die() and custom SIGHUP handler
> > worker_spi_sighup() with standard SignalHandlerForConfigReload().
>
> This patch also looks good to me.
>
Thanks.
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
On 2020/11/20 19:33, Bharath Rupireddy wrote: > On Wed, Nov 18, 2020 at 5:52 PM Fujii Masao <masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>> wrote: > > > > > handle_sigterm() and die() are similar except that die() has extra > > > handling(below) for exiting immediately when waiting for input/command > > > from the client. > > > /* > > > * If we're in single user mode, we want to quit immediately - we can't > > > * rely on latches as they wouldn't work when stdin/stdout is a file. > > > * Rather ugly, but it's unlikely to be worthwhile to invest much more > > > * effort just for the benefit of single user mode. > > > */ > > > if (DoingCommandRead && whereToSendOutput != DestRemote) > > > ProcessInterrupts(); > > > > > > Having this extra handling is correct for normal backends as they can > > > connect directly to clients for reading input commands, the above if > > > condition may become true and ProcessInterrupts() may be called to > > > handle the SIGTERM immediately. > > > > > > Note that DoingCommandRead can never be true in bg workers as it's > > > being set to true only in normal backend PostgresMain(). If we use > > > die()(instead of custom SIGTERM handlers such as handle_sigterm()) in > > > a bg worker/non-backend process, there are no chances that the above > > > part of the code gets hit and the interrupts are processed > > > immediately. And also here are the bg worker process that use die() > > > instead of their own custom handlers: parallel workers, autoprewarm > > > worker, autovacuum worker, logical replication launcher and apply > > > worker, wal sender process > > > > > > I think we can also use die() instead of handle_sigterm() in > > > test_shm_mq/worker.c. > > > > > > I attached a patch for this change. > > > > Thanks for the patch! It looks good to me. > > > > Thanks. When I read the patch again, I found that, with the patch, the shutdown of worker_spi causes to report the following FATAL message. FATAL: terminating connection due to administrator command Isn't this message confusing because it's not a connection? If so, we need to update ProcessInterrupts() so that the proper message is reported like other bgworkers do. BTW, though this is not directly related to this topic, it's strange to me that the normal shutdown of bgworker causes to emit FATAL message and the message "background worker ... exited with exit code 1" at the normal shutdown. I've heard sometimes that users coplained that it's confusing to report that message for logical replication launcher at the normal shutdown. Maybe the mechanism to shutdown bgworker normally seems to have not been implemented well yet. Without the patch, since worker_spi cannot handle the shutdown request promptly while it's executing the backend code, maybe we can say this is a bug. But worker_spi is a just example of the code, I'm not thinking to do back-patch for now. Thought? > > > > > BTW, I tried to find the past discussion about why die() was not used, > > but I could not yet. > > > > I think the commit 2505ce0 that altered the comment has something: Thanks for the info! Will check that. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On Wed, Nov 25, 2020 at 3:29 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>
> On 2020/11/20 19:33, Bharath Rupireddy wrote:
> > On Wed, Nov 18, 2020 at 5:52 PM Fujii Masao <masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>> wrote:
> > >
> > > > handle_sigterm() and die() are similar except that die() has extra
> > > > handling(below) for exiting immediately when waiting for input/command
> > > > from the client.
> > > > /*
> > > > * If we're in single user mode, we want to quit immediately - we can't
> > > > * rely on latches as they wouldn't work when stdin/stdout is a file.
> > > > * Rather ugly, but it's unlikely to be worthwhile to invest much more
> > > > * effort just for the benefit of single user mode.
> > > > */
> > > > if (DoingCommandRead && whereToSendOutput != DestRemote)
> > > > ProcessInterrupts();
> > > >
> > > > Having this extra handling is correct for normal backends as they can
> > > > connect directly to clients for reading input commands, the above if
> > > > condition may become true and ProcessInterrupts() may be called to
> > > > handle the SIGTERM immediately.
> > > >
> > > > Note that DoingCommandRead can never be true in bg workers as it's
> > > > being set to true only in normal backend PostgresMain(). If we use
> > > > die()(instead of custom SIGTERM handlers such as handle_sigterm()) in
> > > > a bg worker/non-backend process, there are no chances that the above
> > > > part of the code gets hit and the interrupts are processed
> > > > immediately. And also here are the bg worker process that use die()
> > > > instead of their own custom handlers: parallel workers, autoprewarm
> > > > worker, autovacuum worker, logical replication launcher and apply
> > > > worker, wal sender process
> > > >
> > > > I think we can also use die() instead of handle_sigterm() in
> > > > test_shm_mq/worker.c.
> > > >
> > > > I attached a patch for this change.
> > >
> > > Thanks for the patch! It looks good to me.
> > >
> >
> > Thanks.
>
> When I read the patch again, I found that, with the patch, the shutdown
> of worker_spi causes to report the following FATAL message.
>
> FATAL: terminating connection due to administrator command
>
> Isn't this message confusing because it's not a connection? If so,
> we need to update ProcessInterrupts() so that the proper message is
> reported like other bgworkers do.
>
This is also true for all the bgworker that use the die() handler. How about doing it the way bgworker_die() does in ProcessInterrupts()? This would give meaningful information. Thoughts? If okay, I can make a separate patch.
else if (RecoveryConflictPending)
{
/* Currently there is only one non-retryable recovery conflict */
Assert(RecoveryConflictReason == PROCSIG_RECOVERY_CONFLICT_DATABASE);
pgstat_report_recovery_conflict(RecoveryConflictReason);
ereport(FATAL,
(errcode(ERRCODE_DATABASE_DROPPED),
errmsg("terminating connection due to conflict with recovery"),
errdetail_recovery_conflict()));
}
else if (IsBackgroundWorker)
{
ereport(FATAL,
(errcode(ERRCODE_ADMIN_SHUTDOWN),
errmsg("terminating background worker \"%s\" due to administrator command",
MyBgworkerEntry->bgw_type)));
}
else
ereport(FATAL,
(errcode(ERRCODE_ADMIN_SHUTDOWN),
errmsg("terminating connection due to administrator command")));
>
> BTW, though this is not directly related to this topic, it's strange to me
> that the normal shutdown of bgworker causes to emit FATAL message
> and the message "background worker ... exited with exit code 1"
> at the normal shutdown. I've heard sometimes that users coplained that
> it's confusing to report that message for logical replication launcher
> at the normal shutdown. Maybe the mechanism to shutdown bgworker
> normally seems to have not been implemented well yet.
>
> Without the patch, since worker_spi cannot handle the shutdown request
> promptly while it's executing the backend code, maybe we can say this is
> a bug. But worker_spi is a just example of the code, I'm not thinking to
> do back-patch for now. Thought?
>
> On 2020/11/20 19:33, Bharath Rupireddy wrote:
> > On Wed, Nov 18, 2020 at 5:52 PM Fujii Masao <masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>> wrote:
> > >
> > > > handle_sigterm() and die() are similar except that die() has extra
> > > > handling(below) for exiting immediately when waiting for input/command
> > > > from the client.
> > > > /*
> > > > * If we're in single user mode, we want to quit immediately - we can't
> > > > * rely on latches as they wouldn't work when stdin/stdout is a file.
> > > > * Rather ugly, but it's unlikely to be worthwhile to invest much more
> > > > * effort just for the benefit of single user mode.
> > > > */
> > > > if (DoingCommandRead && whereToSendOutput != DestRemote)
> > > > ProcessInterrupts();
> > > >
> > > > Having this extra handling is correct for normal backends as they can
> > > > connect directly to clients for reading input commands, the above if
> > > > condition may become true and ProcessInterrupts() may be called to
> > > > handle the SIGTERM immediately.
> > > >
> > > > Note that DoingCommandRead can never be true in bg workers as it's
> > > > being set to true only in normal backend PostgresMain(). If we use
> > > > die()(instead of custom SIGTERM handlers such as handle_sigterm()) in
> > > > a bg worker/non-backend process, there are no chances that the above
> > > > part of the code gets hit and the interrupts are processed
> > > > immediately. And also here are the bg worker process that use die()
> > > > instead of their own custom handlers: parallel workers, autoprewarm
> > > > worker, autovacuum worker, logical replication launcher and apply
> > > > worker, wal sender process
> > > >
> > > > I think we can also use die() instead of handle_sigterm() in
> > > > test_shm_mq/worker.c.
> > > >
> > > > I attached a patch for this change.
> > >
> > > Thanks for the patch! It looks good to me.
> > >
> >
> > Thanks.
>
> When I read the patch again, I found that, with the patch, the shutdown
> of worker_spi causes to report the following FATAL message.
>
> FATAL: terminating connection due to administrator command
>
> Isn't this message confusing because it's not a connection? If so,
> we need to update ProcessInterrupts() so that the proper message is
> reported like other bgworkers do.
>
This is also true for all the bgworker that use the die() handler. How about doing it the way bgworker_die() does in ProcessInterrupts()? This would give meaningful information. Thoughts? If okay, I can make a separate patch.
else if (RecoveryConflictPending)
{
/* Currently there is only one non-retryable recovery conflict */
Assert(RecoveryConflictReason == PROCSIG_RECOVERY_CONFLICT_DATABASE);
pgstat_report_recovery_conflict(RecoveryConflictReason);
ereport(FATAL,
(errcode(ERRCODE_DATABASE_DROPPED),
errmsg("terminating connection due to conflict with recovery"),
errdetail_recovery_conflict()));
}
else if (IsBackgroundWorker)
{
ereport(FATAL,
(errcode(ERRCODE_ADMIN_SHUTDOWN),
errmsg("terminating background worker \"%s\" due to administrator command",
MyBgworkerEntry->bgw_type)));
}
else
ereport(FATAL,
(errcode(ERRCODE_ADMIN_SHUTDOWN),
errmsg("terminating connection due to administrator command")));
>
> BTW, though this is not directly related to this topic, it's strange to me
> that the normal shutdown of bgworker causes to emit FATAL message
> and the message "background worker ... exited with exit code 1"
> at the normal shutdown. I've heard sometimes that users coplained that
> it's confusing to report that message for logical replication launcher
> at the normal shutdown. Maybe the mechanism to shutdown bgworker
> normally seems to have not been implemented well yet.
>
What do you mean by normal shutdown of bgworker? Is it that bgworker has exited successfully with exit code 0 or for some reason with exit code other than 0? Or is it when the postmaster is shutdown normally?
IIUC, when a bgworker exists either normally with an exit code 0 or other than 0, then CleanupBackgroundWorker() is called in postmaster, a message(like below) is prepared, and the LogChildExit() is called with either DEBUG1 or LOG level and for instance the message you specified gets printed "background worker ... exited with exit code 1". I have not seen a FATAL message similar to "background worker ... exited with exit code 1" at the normal shutdown.
snprintf(namebuf, MAXPGPATH, _("background worker \"%s\""), rw->rw_worker.bgw_type);
LogChildExit(EXIT_STATUS_0(exitstatus) ? DEBUG1 : LOG, namebuf, pid, exitstatus);
Am I missing something?
If my analysis is right, then for instance, when a logical replication launcher is exited, it logs "background worker "logical replication launcher" exited with exit code X" with either DEBUG1 or LOG level but not with FATAL level.
> Without the patch, since worker_spi cannot handle the shutdown request
> promptly while it's executing the backend code, maybe we can say this is
> a bug. But worker_spi is a just example of the code, I'm not thinking to
> do back-patch for now. Thought?
>
+1 to not backpatch it.
On 2020/11/25 23:38, Bharath Rupireddy wrote: > On Wed, Nov 25, 2020 at 3:29 PM Fujii Masao <masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com>> wrote: > > > > On 2020/11/20 19:33, Bharath Rupireddy wrote: > > > On Wed, Nov 18, 2020 at 5:52 PM Fujii Masao <masao.fujii@oss.nttdata.com <mailto:masao.fujii@oss.nttdata.com> <mailto:masao.fujii@oss.nttdata.com<mailto:masao.fujii@oss.nttdata.com>>> wrote: > > > > > > > > > handle_sigterm() and die() are similar except that die() has extra > > > > > handling(below) for exiting immediately when waiting for input/command > > > > > from the client. > > > > > /* > > > > > * If we're in single user mode, we want to quit immediately - we can't > > > > > * rely on latches as they wouldn't work when stdin/stdout is a file. > > > > > * Rather ugly, but it's unlikely to be worthwhile to invest much more > > > > > * effort just for the benefit of single user mode. > > > > > */ > > > > > if (DoingCommandRead && whereToSendOutput != DestRemote) > > > > > ProcessInterrupts(); > > > > > > > > > > Having this extra handling is correct for normal backends as they can > > > > > connect directly to clients for reading input commands, the above if > > > > > condition may become true and ProcessInterrupts() may be called to > > > > > handle the SIGTERM immediately. > > > > > > > > > > Note that DoingCommandRead can never be true in bg workers as it's > > > > > being set to true only in normal backend PostgresMain(). If we use > > > > > die()(instead of custom SIGTERM handlers such as handle_sigterm()) in > > > > > a bg worker/non-backend process, there are no chances that the above > > > > > part of the code gets hit and the interrupts are processed > > > > > immediately. And also here are the bg worker process that use die() > > > > > instead of their own custom handlers: parallel workers, autoprewarm > > > > > worker, autovacuum worker, logical replication launcher and apply > > > > > worker, wal sender process > > > > > > > > > > I think we can also use die() instead of handle_sigterm() in > > > > > test_shm_mq/worker.c. > > > > > > > > > > I attached a patch for this change. > > > > > > > > Thanks for the patch! It looks good to me. > > > > > > > > > > Thanks. > > > > When I read the patch again, I found that, with the patch, the shutdown > > of worker_spi causes to report the following FATAL message. > > > > FATAL: terminating connection due to administrator command > > > > Isn't this message confusing because it's not a connection? If so, > > we need to update ProcessInterrupts() so that the proper message is > > reported like other bgworkers do. > > > > This is also true for all the bgworker that use the die() handler. How about doing it the way bgworker_die() does in ProcessInterrupts()?This would give meaningful information. Thoughts? If okay, I can make a separate patch. +1 Thanks! > > else if (RecoveryConflictPending) > { > /* Currently there is only one non-retryable recovery conflict */ > Assert(RecoveryConflictReason == PROCSIG_RECOVERY_CONFLICT_DATABASE); > pgstat_report_recovery_conflict(RecoveryConflictReason); > ereport(FATAL, > (errcode(ERRCODE_DATABASE_DROPPED), > errmsg("terminating connection due to conflict with recovery"), > errdetail_recovery_conflict())); > } > * else if (IsBackgroundWorker) > { > ereport(FATAL, > (errcode(ERRCODE_ADMIN_SHUTDOWN), > errmsg("terminating background worker \"%s\" due to administrator command", > MyBgworkerEntry->bgw_type))); > }* > else > ereport(FATAL, > (errcode(ERRCODE_ADMIN_SHUTDOWN), > errmsg("terminating connection due to administrator command"))); > > > > > BTW, though this is not directly related to this topic, it's strange to me > > that the normal shutdown of bgworker causes to emit FATAL message > > and the message "background worker ... exited with exit code 1" > > at the normal shutdown. I've heard sometimes that users coplained that > > it's confusing to report that message for logical replication launcher > > at the normal shutdown. Maybe the mechanism to shutdown bgworker > > normally seems to have not been implemented well yet. > > > > What do you mean by normal shutdown of bgworker? Is it that bgworker has exited successfully with exit code 0 or for somereason with exit code other than 0? Or is it when the postmaster is shutdown normally? > > IIUC, when a bgworker exists either normally with an exit code 0 or other than 0, then CleanupBackgroundWorker() is calledin postmaster, a message(like below) is prepared, and the LogChildExit() is called with either DEBUG1 or LOG leveland for instance the message you specified gets printed "background worker ... exited with exit code 1". I have notseen a FATAL message similar to "background worker ... exited with exit code 1" at the normal shutdown. > > snprintf(namebuf, MAXPGPATH, _("background worker \"%s\""), rw->rw_worker.bgw_type); > > LogChildExit(EXIT_STATUS_0(exitstatus) ? DEBUG1 : LOG, namebuf, pid, exitstatus); > > Am I missing something? > > If my analysis is right, then for instance, when a logical replication launcher is exited, it logs "background worker "logicalreplication launcher" exited with exit code X" with either DEBUG1 or LOG level but not with FATAL level. Yes, it's not with FATAL level. But that message looks like that it's reporting error message. This is why we sometimes received the complaints (e.g., [1][2]) about that message. [1] https://postgr.es/m/CAKFQuwZHcZnnwoFaXX2YKNT3KhaNr_+vd95Oo=f045zfn7Tetw@mail.gmail.com [2] https://postgr.es/m/CAEepm=1c3hG1g3iKYwfa_PDsT49RBaBJsaot_qNhPSCXBm9rzA@mail.gmail.com Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On Thu, Nov 26, 2020 at 7:37 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > > What do you mean by normal shutdown of bgworker? Is it that bgworker has exited successfully with exit code 0 or forsome reason with exit code other than 0? Or is it when the postmaster is shutdown normally? > > > > IIUC, when a bgworker exists either normally with an exit code 0 or other than 0, then CleanupBackgroundWorker() is calledin postmaster, a message(like below) is prepared, and the LogChildExit() is called with either DEBUG1 or LOG leveland for instance the message you specified gets printed "background worker ... exited with exit code 1". I have notseen a FATAL message similar to "background worker ... exited with exit code 1" at the normal shutdown. > > > > snprintf(namebuf, MAXPGPATH, _("background worker \"%s\""), rw->rw_worker.bgw_type); > > > > LogChildExit(EXIT_STATUS_0(exitstatus) ? DEBUG1 : LOG, namebuf, pid, exitstatus); > > > > Am I missing something? > > > > If my analysis is right, then for instance, when a logical replication launcher is exited, it logs "background worker"logical replication launcher" exited with exit code X" with either DEBUG1 or LOG level but not with FATAL level. > > Yes, it's not with FATAL level. But that message looks like that it's > reporting error message. This is why we sometimes received > the complaints (e.g., [1][2]) about that message. > Oh. Should we do something about it now? Any thoughts? I have not read fully through the threads specified, I will go through them later. With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
On 2020/11/27 0:15, Bharath Rupireddy wrote: > On Thu, Nov 26, 2020 at 7:37 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: >> >>> What do you mean by normal shutdown of bgworker? Is it that bgworker has exited successfully with exit code 0 or forsome reason with exit code other than 0? Or is it when the postmaster is shutdown normally? >>> >>> IIUC, when a bgworker exists either normally with an exit code 0 or other than 0, then CleanupBackgroundWorker() is calledin postmaster, a message(like below) is prepared, and the LogChildExit() is called with either DEBUG1 or LOG leveland for instance the message you specified gets printed "background worker ... exited with exit code 1". I have notseen a FATAL message similar to "background worker ... exited with exit code 1" at the normal shutdown. >>> >>> snprintf(namebuf, MAXPGPATH, _("background worker \"%s\""), rw->rw_worker.bgw_type); >>> >>> LogChildExit(EXIT_STATUS_0(exitstatus) ? DEBUG1 : LOG, namebuf, pid, exitstatus); >>> >>> Am I missing something? >>> >>> If my analysis is right, then for instance, when a logical replication launcher is exited, it logs "background worker"logical replication launcher" exited with exit code X" with either DEBUG1 or LOG level but not with FATAL level. >> >> Yes, it's not with FATAL level. But that message looks like that it's >> reporting error message. This is why we sometimes received >> the complaints (e.g., [1][2]) about that message. >> > > Oh. Should we do something about it now? No. This is not directly related to the issue that we are discussing as I told upthread. Of course, it's better to work on this if we can easily fix it. But seems not... So please ignore this my comment. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On Wed, Nov 25, 2020 at 8:08 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote:
>
> > When I read the patch again, I found that, with the patch, the shutdown
> > of worker_spi causes to report the following FATAL message.
> >
> > FATAL: terminating connection due to administrator command
> >
> > Isn't this message confusing because it's not a connection? If so,
> > we need to update ProcessInterrupts() so that the proper message is
> > reported like other bgworkers do.
> >
>
> This is also true for all the bgworker that use the die() handler. How about doing it the way bgworker_die() does in ProcessInterrupts()? This would give meaningful information. Thoughts? If okay, I can make a separate patch.
>
Attaching the patch that improved the message for bg workers in ProcessInterrupts(). For instance, now it looks like FATAL: terminating background worker "worker_spi" due to administrator command or FATAL: terminating background worker "parallel worker" due to administrator command and so on for other bg workers.
I'm also mentioning the 2 previous patches posted in [1]. One of the patch is for using die() instead of handle_sigterm() in test_shm_mq/worker.c and another is for replacing custom SIGTERM handler worker_spi_sigterm() with die() and custom SIGHUP handler worker_spi_sighup() with standard SignalHandlerForConfigReload()
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
>
> > When I read the patch again, I found that, with the patch, the shutdown
> > of worker_spi causes to report the following FATAL message.
> >
> > FATAL: terminating connection due to administrator command
> >
> > Isn't this message confusing because it's not a connection? If so,
> > we need to update ProcessInterrupts() so that the proper message is
> > reported like other bgworkers do.
> >
>
> This is also true for all the bgworker that use the die() handler. How about doing it the way bgworker_die() does in ProcessInterrupts()? This would give meaningful information. Thoughts? If okay, I can make a separate patch.
>
Attaching the patch that improved the message for bg workers in ProcessInterrupts(). For instance, now it looks like FATAL: terminating background worker "worker_spi" due to administrator command or FATAL: terminating background worker "parallel worker" due to administrator command and so on for other bg workers.
Please review the patch.
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
Attachment
On 2020/11/27 12:13, Bharath Rupireddy wrote: > On Wed, Nov 25, 2020 at 8:08 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com <mailto:bharath.rupireddyforpostgres@gmail.com>>wrote: > > > > > When I read the patch again, I found that, with the patch, the shutdown > > > of worker_spi causes to report the following FATAL message. > > > > > > FATAL: terminating connection due to administrator command > > > > > > Isn't this message confusing because it's not a connection? If so, > > > we need to update ProcessInterrupts() so that the proper message is > > > reported like other bgworkers do. > > > > > > > This is also true for all the bgworker that use the die() handler. How about doing it the way bgworker_die() does inProcessInterrupts()? This would give meaningful information. Thoughts? If okay, I can make a separate patch. > > > > Attaching the patch that improved the message for bg workers in ProcessInterrupts(). For instance, now it looks like *FATAL: terminating background worker "worker_spi" due to administrator command* or *FATAL: terminating background worker"parallel worker" due to administrator command *and so on for other bg workers.* > * > > Please review the patch. Thanks for the patch! It looks good to me. > > I'm also mentioning the 2 previous patches posted in [1]. One of the patch is for using die() instead of handle_sigterm()in test_shm_mq/worker.c and another is for replacing custom SIGTERM handler worker_spi_sigterm() with die()and custom SIGHUP handler worker_spi_sighup() with standard SignalHandlerForConfigReload() Yeah, I pushed them. Thanks! Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On Fri, Nov 27, 2020 at 12:26 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > > > > When I read the patch again, I found that, with the patch, the shutdown > > > > of worker_spi causes to report the following FATAL message. > > > > > > > > FATAL: terminating connection due to administrator command > > > > > > > > Isn't this message confusing because it's not a connection? If so, > > > > we need to update ProcessInterrupts() so that the proper message is > > > > reported like other bgworkers do. > > > > > > > > > > This is also true for all the bgworker that use the die() handler. How about doing it the way bgworker_die() doesin ProcessInterrupts()? This would give meaningful information. Thoughts? If okay, I can make a separate patch. > > > > > > > Attaching the patch that improved the message for bg workers in ProcessInterrupts(). For instance, now it looks like*FATAL: terminating background worker "worker_spi" due to administrator command* or *FATAL: terminating backgroundworker "parallel worker" due to administrator command *and so on for other bg workers.* > > * > > > > Please review the patch. > > Thanks for the patch! It looks good to me. > Thanks! With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
On 2020/11/27 16:47, Bharath Rupireddy wrote: > On Fri, Nov 27, 2020 at 12:26 PM Fujii Masao > <masao.fujii@oss.nttdata.com> wrote: >> >>> > > When I read the patch again, I found that, with the patch, the shutdown >>> > > of worker_spi causes to report the following FATAL message. >>> > > >>> > > FATAL: terminating connection due to administrator command >>> > > >>> > > Isn't this message confusing because it's not a connection? If so, >>> > > we need to update ProcessInterrupts() so that the proper message is >>> > > reported like other bgworkers do. >>> > > >>> > >>> > This is also true for all the bgworker that use the die() handler. How about doing it the way bgworker_die() doesin ProcessInterrupts()? This would give meaningful information. Thoughts? If okay, I can make a separate patch. >>> > >>> >>> Attaching the patch that improved the message for bg workers in ProcessInterrupts(). For instance, now it looks like*FATAL: terminating background worker "worker_spi" due to administrator command* or *FATAL: terminating backgroundworker "parallel worker" due to administrator command *and so on for other bg workers.* >>> * >>> >>> Please review the patch. >> >> Thanks for the patch! It looks good to me. >> > > Thanks! Pushed. Thanks! Since all the patches that proposed in this thread were committed, I marked the CF entry as committed. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On 2020/11/04 18:06, Fujii Masao wrote: > > > On 2020/10/29 15:21, Fujii Masao wrote: >> >> >> On 2020/10/07 11:30, Bharath Rupireddy wrote: >>> On Tue, Oct 6, 2020 at 11:41 AM Bharath Rupireddy >>> <bharath.rupireddyforpostgres@gmail.com> wrote: >>>> >>>> On Tue, Oct 6, 2020 at 11:20 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: >>>>> >>>>> +1 Or it's also ok to make each patch separately. >>>>> Anyway, thanks! >>>>> >>>> >>>> Thanks. +1 to have separate patches. I will post two separate patches >>>> for autoprewarm and WalRcvShutdownHandler for further review. The >>>> other two patches can be for startup.c and syslogger.c. >>>> >>> >>> I'm attaching all the 4 patches here together. >>> >>> 1. v1-use-standard-SIGHUP-and-SIGTERM-handlers-in-autoprewarm-process.patch >>> -- This is the patch initially sent in this mail by me, I just >>> renamed it. >>> 2. v1-use-MyLatch-and-standard-SIGHUP-handler-in-startup-process.patch >>> -- This is the patch proposed by Fuji Masao, I just renamed it. >>> 3. v1-use-standard-SIGHUP-hanlder-in-syslogger-process.patch >>> 4. v1-use-standard-SIGTERM-handler-in-walreceiver-process.patch >>> >>> Please consider these patches for further review. >> >> Thanks for the patches! They look good to me. So barring any objections, >> I will commit them one by one. > > I pushed the following two patches. > > - v1-use-standard-SIGHUP-hanlder-in-syslogger-process.patch > - v1-use-MyLatch-and-standard-SIGHUP-handler-in-startup-process.patch As I told in other thread [1], I'm thinking to revert this patch because this change could have bad side-effect on the startup process waiting for recovery conflict. Before applying the patch, the latch that the startup process used to wait for recovery conflict was different from the latch that SIGHUP signal handler or walreceiver process, etc set to wake the startup process up. So SIGHUP or walreceiver didn't wake the startup process waiting for recovery conflict up unnecessary. But the patch got rid of the dedicated latch for signaling the startup process. This change forced us to use the same latch to make the startup process wait or wake up. Which caused SIGHUP signal handler or walreceiver proces to wake the startup process waiting on the latch for recovery conflict up unnecessarily frequently. While waiting for recovery conflict on buffer pin, deadlock needs to be checked at least every deadlock_timeout. But that frequent wakeups could prevent the deadlock timer from being triggered and could delay that deadlock checks. Therefore, I'm thinking to revert the commit ac22929a26 and 113d3591b8. [1] https://www.postgresql.org/message-id/a802f1c0-58d9-bd3f-bc3e-bdad54726855@oss.nttdata.com Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
At Tue, 15 Dec 2020 23:10:28 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in > > I pushed the following two patches. > > - v1-use-standard-SIGHUP-hanlder-in-syslogger-process.patch > > - v1-use-MyLatch-and-standard-SIGHUP-handler-in-startup-process.patch > > As I told in other thread [1], I'm thinking to revert this patch > because this change could have bad side-effect on the startup > process waiting for recovery conflict. > > Before applying the patch, the latch that the startup process > used to wait for recovery conflict was different from the latch > that SIGHUP signal handler or walreceiver process, etc set to > wake the startup process up. So SIGHUP or walreceiver didn't > wake the startup process waiting for recovery conflict up unnecessary. > > But the patch got rid of the dedicated latch for signaling > the startup process. This change forced us to use the same latch > to make the startup process wait or wake up. Which caused SIGHUP > signal handler or walreceiver proces to wake the startup process > waiting on the latch for recovery conflict up unnecessarily > frequently. > > While waiting for recovery conflict on buffer pin, deadlock needs > to be checked at least every deadlock_timeout. But that frequent > wakeups could prevent the deadlock timer from being triggered and > could delay that deadlock checks. I thought that spurious wakeups don't harm. But actually ResolveRecoveryConflictWithBufferPin doesn't consider spurious wakeups. Only the timer woke up ResolveRecoveryConflictWithBufferPin before the patch comes. Currently SIGHUP and XLogFlush() (on walreceiver) also wake up startup process. For a moment I thought that ResolveRecoveryConflictWithBufferPin should wake up at shutdown time by the old recovery latch but it's not the case since it wakes up after all blockers go away. It seems to me simpler to revert the patches than making the function properly handle spurious wakeups. > Therefore, I'm thinking to revert the commit ac22929a26 and > 113d3591b8. > > [1] > https://www.postgresql.org/message-id/a802f1c0-58d9-bd3f-bc3e-bdad54726855@oss.nttdata.com As the result, I agree to revert them. But I think we need to add a comment for the reason we don't use MyLatch for recovery-wakeup after reverting them. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On 2020/12/16 16:24, Kyotaro Horiguchi wrote: > At Tue, 15 Dec 2020 23:10:28 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in >>> I pushed the following two patches. >>> - v1-use-standard-SIGHUP-hanlder-in-syslogger-process.patch >>> - v1-use-MyLatch-and-standard-SIGHUP-handler-in-startup-process.patch >> >> As I told in other thread [1], I'm thinking to revert this patch >> because this change could have bad side-effect on the startup >> process waiting for recovery conflict. >> >> Before applying the patch, the latch that the startup process >> used to wait for recovery conflict was different from the latch >> that SIGHUP signal handler or walreceiver process, etc set to >> wake the startup process up. So SIGHUP or walreceiver didn't >> wake the startup process waiting for recovery conflict up unnecessary. >> >> But the patch got rid of the dedicated latch for signaling >> the startup process. This change forced us to use the same latch >> to make the startup process wait or wake up. Which caused SIGHUP >> signal handler or walreceiver proces to wake the startup process >> waiting on the latch for recovery conflict up unnecessarily >> frequently. >> >> While waiting for recovery conflict on buffer pin, deadlock needs >> to be checked at least every deadlock_timeout. But that frequent >> wakeups could prevent the deadlock timer from being triggered and >> could delay that deadlock checks. > > I thought that spurious wakeups don't harm. But actually > ResolveRecoveryConflictWithBufferPin doesn't consider spurious > wakeups. Only the timer woke up ResolveRecoveryConflictWithBufferPin > before the patch comes. Currently SIGHUP and XLogFlush() (on > walreceiver) also wake up startup process. > > For a moment I thought that ResolveRecoveryConflictWithBufferPin > should wake up at shutdown time by the old recovery latch but it's not > the case since it wakes up after all blockers go away. It seems to me > simpler to revert the patches than making the function properly handle > spurious wakeups. > >> Therefore, I'm thinking to revert the commit ac22929a26 and >> 113d3591b8. >> >> [1] >> https://www.postgresql.org/message-id/a802f1c0-58d9-bd3f-bc3e-bdad54726855@oss.nttdata.com > > As the result, I agree to revert them. But I think we need to add a > comment for the reason we don't use MyLatch for recovery-wakeup after > reverting them. Agreed. Attached is the patch that reverts those patches and adds the comments about why both procLatch and recoveryWakeupLatch are necessary. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Attachment
On 2020/12/16 18:12, Fujii Masao wrote: > > > On 2020/12/16 16:24, Kyotaro Horiguchi wrote: >> At Tue, 15 Dec 2020 23:10:28 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in >>>> I pushed the following two patches. >>>> - v1-use-standard-SIGHUP-hanlder-in-syslogger-process.patch >>>> - v1-use-MyLatch-and-standard-SIGHUP-handler-in-startup-process.patch >>> >>> As I told in other thread [1], I'm thinking to revert this patch >>> because this change could have bad side-effect on the startup >>> process waiting for recovery conflict. >>> >>> Before applying the patch, the latch that the startup process >>> used to wait for recovery conflict was different from the latch >>> that SIGHUP signal handler or walreceiver process, etc set to >>> wake the startup process up. So SIGHUP or walreceiver didn't >>> wake the startup process waiting for recovery conflict up unnecessary. >>> >>> But the patch got rid of the dedicated latch for signaling >>> the startup process. This change forced us to use the same latch >>> to make the startup process wait or wake up. Which caused SIGHUP >>> signal handler or walreceiver proces to wake the startup process >>> waiting on the latch for recovery conflict up unnecessarily >>> frequently. >>> >>> While waiting for recovery conflict on buffer pin, deadlock needs >>> to be checked at least every deadlock_timeout. But that frequent >>> wakeups could prevent the deadlock timer from being triggered and >>> could delay that deadlock checks. >> >> I thought that spurious wakeups don't harm. But actually >> ResolveRecoveryConflictWithBufferPin doesn't consider spurious >> wakeups. Only the timer woke up ResolveRecoveryConflictWithBufferPin >> before the patch comes. Currently SIGHUP and XLogFlush() (on >> walreceiver) also wake up startup process. >> >> For a moment I thought that ResolveRecoveryConflictWithBufferPin >> should wake up at shutdown time by the old recovery latch but it's not >> the case since it wakes up after all blockers go away. It seems to me >> simpler to revert the patches than making the function properly handle >> spurious wakeups. >> >>> Therefore, I'm thinking to revert the commit ac22929a26 and >>> 113d3591b8. >>> >>> [1] >>> https://www.postgresql.org/message-id/a802f1c0-58d9-bd3f-bc3e-bdad54726855@oss.nttdata.com >> >> As the result, I agree to revert them. But I think we need to add a >> comment for the reason we don't use MyLatch for recovery-wakeup after >> reverting them. > > Agreed. Attached is the patch that reverts those patches and > adds the comments about why both procLatch and recoveryWakeupLatch > are necessary. Pushed. Thanks! Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Hi, On 2020-12-16 18:12:39 +0900, Fujii Masao wrote: > - /* Wait to be signaled by UnpinBuffer() */ > + /* > + * Wait to be signaled by UnpinBuffer(). > + * > + * We assume that only UnpinBuffer() and the timeout requests established > + * above can wake us up here. WakeupRecovery() called by walreceiver or > + * SIGHUP signal handler, etc cannot do that because it uses the different > + * latch from that ProcWaitForSignal() waits on. > + */ > ProcWaitForSignal(PG_WAIT_BUFFER_PIN); > > /* Isn't this comment bogus? The latch could e.g. be set by procsignal_sigusr1_handler(), which the startup process uses. Or it could already be set, when entering ResolveRecoveryConflictWithBufferPin(). Why is it even relevant that we only get woken up by UnpinBuffer()? Greetings, Andres Freund