Re: Improve pg_sync_replication_slots() to wait for primary to advance - Mailing list pgsql-hackers

From Ajin Cherian
Subject Re: Improve pg_sync_replication_slots() to wait for primary to advance
Date
Msg-id CAFPTHDYBZytxLaOsU1FyhgNC_BkfXPUQPihNxUB2dYp-GwTEFA@mail.gmail.com
Whole thread Raw
In response to Re: Improve pg_sync_replication_slots() to wait for primary to advance  (shveta malik <shveta.malik@gmail.com>)
Responses Re: Improve pg_sync_replication_slots() to wait for primary to advance
Re: Improve pg_sync_replication_slots() to wait for primary to advance
List pgsql-hackers
On Thu, Aug 14, 2025 at 4:44 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Thu, Aug 14, 2025 at 7:28 AM Ajin Cherian <itsajin@gmail.com> wrote:
> >
> >
> > Patch v6 attached.
> >
>
> Thanks Ajin. Please find my comments:
>
> 1)
> SyncReplicationSlots:
> + remote_slots = fetch_or_refresh_remote_slots(wrconn, NIL);
> +
> + /* Retry until all slots are sync ready atleast */
> + for (;;)
> + {
> + bool some_slot_updated = false;
> +
> + /*
> + * Refresh the remote slot data. We keep using the original slot
> + * list, even if some slots are already sync ready, so that all
> + * slots are updated with the latest status from the primary.
> + */
> + remote_slots = fetch_or_refresh_remote_slots(wrconn, remote_slots);
>
> When the API  begins, it seems we are fetching remote_list twice
> before we even sync it once. We can get rid of
> 'fetch_or_refresh_remote_slots' from outside the loop and retain the
> inside one. At first call, remote_slots will be NIL and thus it will
> fetch all slots and in subsequent calls, it will be populated one.
>

Fixed.

>
> 2)
> SyncReplicationSlots:
> + /*
> + * The syscache access in fetch_or_refresh_remote_slots() needs a
> + * transaction env.
> + */
> + if (!IsTransactionState()) {
> + StartTransactionCommand();
> + started_tx = true;
> + }
>
> + if (started_tx)
> + CommitTransactionCommand();
>
> Shall we move these 2 inside fetch_or_refresh_remote_slots() (both
> worker and APi flow) similar to how validate_remote_info() also has it
> inside?
>

I tried this but it doesn't work because when the transaction is
committed, the memory allocation for the remote slots are also freed.
So, this needs to be on the outside.

> 3)
> SyncReplicationSlots:
> + /* Done if all slots are atleast sync ready */
> + if (!SlotSyncCtx->slot_not_persisted)
> + break;
> + else
> + {
> + /* wait for 2 seconds before retrying */
> + wait_for_slot_activity(some_slot_updated, true);
>
> No need to have 'else' block here. The code can be put without having
> 'else' because 'if' when true, breaks from the loop.
>

Fixed.

>
> 4)
> 'fetch_or_refresh_remote_slots' can be renamed to 'fetch_remote_slots'
> simply and then a comment can define an extra argument. Because
> ultimately we are re-fetching some/all slots in both cases.
>

Done.

> 5)
> In the case of API, wait_for_slot_activity() does not change its wait
> time based on 'some_slot_updated'. I think we can pull 'WaitLatch,
> ResetLatch' in API-function itself and lets not change worker's
> wait_for_slot_activity().
>

Done.

> 6)
> fetch_or_refresh_remote_slots:
> + {
> + if (is_refresh)
> + {
> + ereport(WARNING,
> + errmsg("could not fetch updated failover logical slots info"
> +    " from the primary server: %s",
> +    res->err));
> + pfree(query.data);
> + return remote_slot_list; /* Return original list on refresh failure */
> + }
> + else
> + {
> + ereport(ERROR,
> + errmsg("could not fetch failover logical slots info from the primary
> server: %s",
> +    res->err));
> + }
> + }
>
> I think there is no need for different behaviour here for worker and
> API. Since worker errors-out here, we can make API also error-out.
>

Fixed.

> 7)
> +fetch_or_refresh_remote_slots(WalReceiverConn *wrconn, List *remote_slot_list)
>
> We can name the argument as 'target_slot_list' and replace the name
> 'updated_slot_list' with 'remote_slot_list'.
>

Fixed.

>
> 8)
> + /* If refreshing, free the original list structures */
> + if (is_refresh)
> + {
> + foreach(lc, remote_slot_list)
> + {
> + RemoteSlot *old_slot = (RemoteSlot *) lfirst(lc);
> + pfree(old_slot);
> + }
> + list_free(remote_slot_list);
> + }
>
> We can get rid of 'is_refresh' and can simply check if
> 'target_slot_list != NIL', free it. We can use list_free_deep instead
> of freeing each element. Having said that, it looks slightly odd to
> free the list in this function, I will think  more here. Meanwhile, we
> can do this.
>

Fixed.

>
> 9)
> -update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
> +update_and_persist_local_synced_slot(WalReceiverConn * wrconn,
> + RemoteSlot * remote_slot, Oid remote_dbid)
>
> We can get rid of wrconn as we are not using it. Same with wrconn
> argument for synchronize_one_slot()
>

Done.

> 10)
> + /* used by pg_sync_replication_slots() API only */
> + bool slot_not_persisted;
>
> We can move comment outside structure. We can first define it and then
> say the above line.
>

Done.

>
> 11)
> + SlotSyncCtx->slot_not_persisted = false;
>
> This may overwrite the 'slot_not_persisted' set for the previous slot
> and ultimately make it 'false' at the end of cycle even though we had
> few not-persisted slots in the beginning of cycle. Should it be:
>
> SlotSyncCtx->slot_not_persisted |= false;
>

Fixed.

> 12)
> Shall we rename this to : slot_persistence_pending (based on many
> other modules using similar names: detach_pending, send_pending,
> callback_pending)?
>

Done.

> 13)
> - errmsg("could not synchronize replication slot \"%s\"",
> -    remote_slot->name),
> - errdetail("Synchronization could lead to data loss, because the
> remote slot needs WAL at LSN %X/%08X and catalog xmin %u, but the
> standby has LSN %X/%08X and catalog xmin %u.",
> -   LSN_FORMAT_ARGS(remote_slot->restart_lsn),
> -   remote_slot->catalog_xmin,
> -   LSN_FORMAT_ARGS(slot->data.restart_lsn),
> -   slot->data.catalog_xmin));
> + errmsg("Replication slot \"%s\" is not sync ready; will keep retrying",
> +    remote_slot->name),
> + errdetail("Attempting Synchronization could lead to data loss,
> because the remote slot needs WAL at LSN %X/%08X and catalog xmin %u,
> but the standby has LSN %X/%08X and catalog xmin %u.",
> + LSN_FORMAT_ARGS(remote_slot->restart_lsn),
> + remote_slot->catalog_xmin,
> + LSN_FORMAT_ARGS(slot->data.restart_lsn),
> + slot->data.catalog_xmin));
>
> We can retain the same message as it was put after a lot of
> discussion. We can attempt to change if others comment. The idea is
> since a worker dumps it in each subsequent cycle (if such a situation
> arises), on the same basis now the API can also do so because it is
> also performing multiple cycles now. Earlier I had suggested changing
> it for API based on messages 'continuing to wait..' which  are no
> longer there now.
>

Done.


On Thu, Aug 14, 2025 at 10:44 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> On Thu, Aug 14, 2025 at 12:14 PM shveta malik <shveta.malik@gmail.com> wrote:
> >
> >
> > 8)
> > + /* If refreshing, free the original list structures */
> > + if (is_refresh)
> > + {
> > + foreach(lc, remote_slot_list)
> > + {
> > + RemoteSlot *old_slot = (RemoteSlot *) lfirst(lc);
> > + pfree(old_slot);
> > + }
> > + list_free(remote_slot_list);
> > + }
> >
> > We can get rid of 'is_refresh' and can simply check if
> > 'target_slot_list != NIL', free it. We can use list_free_deep instead
> > of freeing each element. Having said that, it looks slightly odd to
> > free the list in this function, I will think  more here. Meanwhile, we
> > can do this.
> >
>
> +1.  The function prologue doesn't mention that the original list is
> deep freed. So a caller may try to access it after this call, which
> will lead to a crash. As a safe programming practice we should let the
> caller free the original list if it is not needed anymore OR modify
> the input list in-place and return it for the convenience of the
> caller like all list_* interfaces. At least we should document this
> behavior in the function prologue. You could also use foreach_ptr
> instead of foreach.
>

I've changed the logic so that it is the responsibility of the caller
to free the list.

> > 13)
> > - errmsg("could not synchronize replication slot \"%s\"",
> > -    remote_slot->name),
> > - errdetail("Synchronization could lead to data loss, because the
> > remote slot needs WAL at LSN %X/%08X and catalog xmin %u, but the
> > standby has LSN %X/%08X and catalog xmin %u.",
> > -   LSN_FORMAT_ARGS(remote_slot->restart_lsn),
> > -   remote_slot->catalog_xmin,
> > -   LSN_FORMAT_ARGS(slot->data.restart_lsn),
> > -   slot->data.catalog_xmin));
> > + errmsg("Replication slot \"%s\" is not sync ready; will keep retrying",
> > +    remote_slot->name),
> > + errdetail("Attempting Synchronization could lead to data loss,
> > because the remote slot needs WAL at LSN %X/%08X and catalog xmin %u,
> > but the standby has LSN %X/%08X and catalog xmin %u.",
> > + LSN_FORMAT_ARGS(remote_slot->restart_lsn),
> > + remote_slot->catalog_xmin,
> > + LSN_FORMAT_ARGS(slot->data.restart_lsn),
> > + slot->data.catalog_xmin));
> >
> > We can retain the same message as it was put after a lot of
> > discussion. We can attempt to change if others comment. The idea is
> > since a worker dumps it in each subsequent cycle (if such a situation
> > arises), on the same basis now the API can also do so because it is
> > also performing multiple cycles now. Earlier I had suggested changing
> > it for API based on messages 'continuing to wait..' which  are no
> > longer there now.
>
> Also we usually don't use capital letters at the start of the error
> message. Any reason this is different?
>

Retained the old message.


> Some more
>
> + * When called from pg_sync_replication_slots, use a fixed 2
> + * second wait time.
>
> Function prologue doesn't mention this. Probably the prologue should
> contain only the first sentence there. Rest of the prologues just
> repeat comments in the function. The function is small enough that a
> reader could read the details from the function instead of the
> prologue.
>
> + wait_time = SLOTSYNC_API_NAPTIME_MS;
> + } else {
>
> } else and { should be on separate lines.
>

I've removed the changes in this function and it is now the same as before.

Attaching patch v7 addressing all the above comments.

regards,
Ajin Cherian
Fujitsu Australia

Attachment

pgsql-hackers by date:

Previous
From: Julien Rouhaud
Date:
Subject: Re: Plan caching and serialization for reuse across executions
Next
From: Thomas Munro
Date:
Subject: Re: VM corruption on standby