On Tue, Jan 10, 2023 at 10:59:14AM +0530, Amit Kapila wrote:
> I haven't looked in detail but isn't it better to explain somewhere in
> the comments that it achieves to rate limit the restart of workers in
> case of error and allows them to restart immediately in case of
> subscription parameter change?
I expanded one of the existing comments to make this clear.
> Another minor point: Don't we need to set the launcher's latch after
> removing the entry from the hash table to avoid the launcher waiting
> on the latch for a bit longer?
The launcher's latch should be set when the apply worker exits. The apply
worker's notify_pid is set to the launcher, which means the launcher
will be sent SIGUSR1 on exit. The launcher's SIGUSR1 handler sets its
latch.
Of course, if the launcher restarts, then the notify_pid will no longer be
accurate. However, I see that workers also register a before_shmem_exit
callback that will send SIGUSR1 to the launcher_pid currently stored in
shared memory. (I wonder if there is a memory ordering bug here.)
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com