Thread: Postmaster doesn't send SIGTERM to bgworker during fast shutdown whenpmState == PM_STARTUP

Hello hackers,

it is possible to start bgworker with bgw_start_time =
BgWorkerStart_PostmasterStart, which will be started immediately after
postmaster.

But if you try to do a fast shutdown while postmaster still in the
pmState == PM_STARTUP, bgworker will never get SIGTERM and postmaster
will wait forever.
At the same time, if you do immediate or smart shutdown, it works fine.

The problem is in the pmdie function. Proposed fix attached.


Regards,
--
Alexander Kukushkin

Attachment
On Sun, Aug 26, 2018 at 06:00:23PM +0200, Alexander Kukushkin wrote:
> it is possible to start bgworker with bgw_start_time =
> BgWorkerStart_PostmasterStart, which will be started immediately after
> postmaster.

Right.

> But if you try to do a fast shutdown while postmaster still in the
> pmState == PM_STARTUP, bgworker will never get SIGTERM and postmaster
> will wait forever.
> At the same time, if you do immediate or smart shutdown, it works fine.
>
> The problem is in the pmdie function. Proposed fix attached.

That seems like a good catch and a correct fix to me.  The handling of
SIGINT is inconsistent with SIGTERM in pmdie().  I would just add a
comment to mention that at this stage only the startup process is
running, and that it has been signaled already.  I'll commit that
tomorrow.
--
Michael

Attachment
On Mon, Aug 27, 2018 at 07:34:55PM -0700, Michael Paquier wrote:
> That seems like a good catch and a correct fix to me.  The handling of
> SIGINT is inconsistent with SIGTERM in pmdie().  I would just add a
> comment to mention that at this stage only the startup process is
> running, and that it has been signaled already.  I'll commit that
> tomorrow.

I have been studying your patch, but it seems to me that this is not
complete as other processes could have been started before switching
from PM_STARTUP to PM_RECOVERY.  I am talking here about the bgwriter
and the checkpointer as well.  Shouldn't we switch pmState to
PM_WAIT_BACKENDS?  Your patch is missing that.
--
Michael

Attachment
Hi,

2018-08-29 1:24 GMT+02:00 Michael Paquier <michael@paquier.xyz>:

> I have been studying your patch, but it seems to me that this is not
> complete as other processes could have been started before switching
> from PM_STARTUP to PM_RECOVERY.  I am talking here about the bgwriter
> and the checkpointer as well.  Shouldn't we switch pmState to
> PM_WAIT_BACKENDS?  Your patch is missing that.

Yeah, good catch, it starts checkpointer, bgwriter and in some cases
even archiver processes (when archive_mode=always) while pmState is
still equaled PM_START.
Please find attached the new version of the fix.


Regards,
--
Alexander Kukushkin

Attachment
On Wed, Aug 29, 2018 at 09:09:08AM +0200, Alexander Kukushkin wrote:
> Yeah, good catch, it starts checkpointer, bgwriter and in some cases
> even archiver processes (when archive_mode=always) while pmState is
> still equaled PM_START.
> Please find attached the new version of the fix.

Thanks, pushed and back-patched down to 9.5 which is where the bug has
been introduced as before that SignalUnconnectedWorkers() was doing all
the work.
--
Michael

Attachment