Re: Synchronizing slots from primary to standby - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Synchronizing slots from primary to standby
Date
Msg-id CAA4eK1LrZuDLGu6saZ8RDs7gKLM0dmafq+9oqRPKKN1cEEao_Q@mail.gmail.com
Whole thread Raw
In response to Re: Synchronizing slots from primary to standby  ("Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com>)
Responses Re: Synchronizing slots from primary to standby
Re: Synchronizing slots from primary to standby
Re: Synchronizing slots from primary to standby
List pgsql-hackers
On Mon, Apr 17, 2023 at 7:37 PM Drouvot, Bertrand
<bertranddrouvot.pg@gmail.com> wrote:
>
> Please find attached V5 (a rebase of V4 posted up-thread).
>
> In addition to the "rebasing" work, the TAP test adds a test about conflict handling (logical slot invalidation)
> relying on the work done in the "Minimal logical decoding on standby" patch series.
>
> I did not look more at the patch (than what's was needed for the rebase) but plan to do so.
>

Are you still planning to continue working on this? Some miscellaneous
comments while going through this patch are as follows?

1. Can you please try to explain the functionality of the overall
patch somewhere in the form of comments and or commit message?

2. It seems that the initially synchronized list of slots is only used
to launch a per-database worker to synchronize all the slots
corresponding to that database. If so, then why do we need to fetch
all the slot-related information via LIST_SLOTS command?

3. As mentioned in the initial email, I think it would be better to
replace LIST_SLOTS command with a SELECT query.

4. How the limit of sync_slot workers is decided? Can we document such
a piece of information? Do we need a new GUC to decide the number of
workers? Ideally, it would be better to avoid GUC, can we use any
existing logical replication workers related GUC?

5. Can we separate out the functionality related to standby_slot_names
in a separate patch, probably the first one? I think that will patch
easier to review.

6. In libpqrcv_list_slots(), two-phase related slot information is not
retrieved. Is there a reason for the same?

7.
+static void
+wait_for_standby_confirmation(XLogRecPtr commit_lsn)

Some comments atop this function would make it easier to review.

8.
+/*-------------------------------------------------------------------------
+ * slotsync.c
+ *    PostgreSQL worker for synchronizing slots to a standby from primary
+ *
+ * Copyright (c) 2016-2018, PostgreSQL Global Development Group
+ *

The copyright notice is out-of-date.

9. Why synchronize_one_slot() compares
MyReplicationSlot->data.restart_lsn with the value of
confirmed_flush_lsn passed to it? Also, why it does only for new slots
but not existing slots?

10. Can we somehow test if the restart_lsn is advanced properly after
sync? I think it is important to ensure that because otherwise after
standby's promotion, the subscriber can start syncing from the wrong
position.

--
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: Use the enum value CRS_EXPORT_SNAPSHOT instead of "0"
Next
From: Nikita Malakhov
Date:
Subject: Re: Pluggable toaster