Re: Synchronizing slots from primary to standby - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Synchronizing slots from primary to standby
Date
Msg-id CAA4eK1+3tCskiT3ma5NCdqBX+_BaJ1sRyMFrJB-ObN7HHe_8jg@mail.gmail.com
Whole thread Raw
In response to Re: Synchronizing slots from primary to standby  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: Synchronizing slots from primary to standby
List pgsql-hackers
On Mon, Feb 5, 2024 at 7:56 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> ---
> Since Two processes (e.g. the slotsync worker and
> pg_sync_replication_slots()) concurrently fetch and update the slot
> information, there is a race condition where slot's
> confirmed_flush_lsn goes backward.
>

Right, this is possible, though there shouldn't be a problem because
anyway, slotsync is an async process. Till we hold restart_lsn, the
required WAL won't be removed. Having said that, I can think of two
ways to avoid it: (a) We can have some flag in shared memory using
which we can detect whether any other process is doing slot
syncronization and then either error out at that time or simply wait
or may take nowait kind of parameter from user to decide what to do?
If this is feasible, we can simply error out for the first version and
extend it later if we see any use cases for the same (b) similar to
restart_lsn, if confirmed_flush_lsn is getting moved back, raise an
error, this is good for now but in future we may still have another
similar issue, so I would prefer (a) among these but I am fine if you
prefer (b) or have some other ideas like just note down in comments
that this is a harmless case and can happen only very rarely.

>
> ---
> +     It is recommended that subscriptions are first disabled before promoting
> f+     the standby and are enabled back after altering the connection string.
>
> I think it's better to describe the reason why it's recommended to
> disable subscriptions before the standby promotion.
>

Agreed. The reason I see for this is that if we don't disable the
subscription before promotion and changing the connection string there
is a chance that the old primary comes back and the subscriber can
have some additional data, though the chances of same are less.

> ---
> +/* Slot sync worker objects */
> +extern PGDLLIMPORT char *PrimaryConnInfo;
> +extern PGDLLIMPORT char *PrimarySlotName;
>
> These two variables are declared also in xlogrecovery.h. Is it
> intentional? If so, I think it's better to write comments.
>
> ---
> Global functions and variables used by the slotsync worker are
> declared in logicalworker.h and worker_internal.h. But is it really
> okay to make a dependency between the slotsync worker and logical
> replication workers? IIUC the slotsync worker is conceptually a
> separate feature from the logical replication. I think the slotsync
> worker can have its own header file.
>

+1.

>
> ---
> +     Confirm that the standby server is not lagging behind the subscribers.
> +     This step can be skipped if
> +     <link linkend="guc-standby-slot-names"><varname>standby_slot_names</varname></link>
> +     has been correctly configured.
>
> How can the user confirm if standby_slot_names is correctly configured?
>

I think users can refer to LOGs to see if it has changed since the
first time it was configured. I tried by existing parameter and see
the following in LOG:
LOG:  received SIGHUP, reloading configuration files
2024-02-06 11:38:59.069 IST [9240] LOG:  parameter "autovacuum" changed to "on"

If the user can't confirm then it is better to follow the steps
mentioned in the patch. Do you want something else to be written in
docs for this? If so, what?

--
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Make COPY format extendable: Extract COPY TO format implementations
Next
From: Michael Paquier
Date:
Subject: Re: Add Index-level REINDEX with multiple jobs