Re: Use SIGTERM instead of SIGUSR1 for slotsync worker to exit during promotion? - Mailing list pgsql-hackers

From Fujii Masao
Subject Re: Use SIGTERM instead of SIGUSR1 for slotsync worker to exit during promotion?
Date
Msg-id CAHGQGwHABvuCoyM24HUiFZ5oJq_CoFomjt_cqD-0cJLMjFXJjQ@mail.gmail.com
Whole thread
In response to Re: Use SIGTERM instead of SIGUSR1 for slotsync worker to exit during promotion?  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: Use SIGTERM instead of SIGUSR1 for slotsync worker to exit during promotion?
Re: Use SIGTERM instead of SIGUSR1 for slotsync worker to exit during promotion?
Re: Use SIGTERM instead of SIGUSR1 for slotsync worker to exit during promotion?
List pgsql-hackers
On Sun, Mar 22, 2026 at 1:52 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Mar 18, 2026 at 9:35 PM Fujii Masao <masao.fujii@gmail.com> wrote:
> >
> > I noticed that during standby promotion the startup process sends SIGUSR1 to
> > the slotsync worker to make it exit. Is there a reason for using SIGUSR1?
> >
>
> IIRC, this same signal is used for both the backend executing
> pg_sync_replication_slots() and slotsync worker. We want the worker to
> exit and error_out backend. Using SIGTERM for backend could result in
> its exit.

Why do we want the backend running pg_sync_replication_slots() to throw
an error here, rather than just exit? If emitting an error is really required,
another option would be to store the process type in SlotSyncCtx and send
different signals accordingly, for example, SIGTERM for the slotsync worker
and another signal for a backend. But it seems simpler and sufficient to have
the backend exit in this case as well.


> Also, we want the last slotsync cycle to complete before
> promotion so that chances of subscribers that do failover/switchover
> to new primary has better chances of finding failover slots
> sync-ready.

I'm not sure how much this behavior helps in failover/switchover scenarios.
But the main issue is that if a primary crash triggers standby promotion,
that last slotsync cycle can get stuck waiting for input from the primary,
which delays promotion. IOW, failover time can become unnecessarily long
due to the slotsync worker. I'd like to address that problem.

Regards,

--
Fujii Masao



pgsql-hackers by date:

Previous
From: Ashutosh Sharma
Date:
Subject: Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication
Next
From: Richard Guo
Date:
Subject: Re: Remove inner joins based on foreign keys