Re: How can end users know the cause of LR slot sync delays? - Mailing list pgsql-hackers

From Shlok Kyal
Subject Re: How can end users know the cause of LR slot sync delays?
Date
Msg-id CANhcyEVFZN2Mkjs0QHshKm2_3AkQ0eufjkD12eL2MeuVkPyGbw@mail.gmail.com
Whole thread Raw
In response to RE: How can end users know the cause of LR slot sync delays?  ("Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com>)
List pgsql-hackers
On Wed, 17 Sept 2025 at 16:24, Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Dear Shlok,
>
> Thanks for creating the patch. Personally I prefer approach2; approach1 cannot
> indicate the current status of synchronization, it just shows the history.
> I feel approach2 has more information than approach1.
>
I agree with your point. I think the preferred behaviour of
slot_sync_skip_reason is to indicate the current status of
synchronization. And taking account the current behaviour of columns
of views pg_replication_slots and pg_stat_replication_slots, I think
slot_sync_skip_reason is suited for pg_replication_slots view and
I am proceeding ahead with approach2

> Below are my comments for your patch.
>
> 01.
> ```
> +        Number of times the slot sync is skipped.
> ```
>
> "slot sync" should be "slot synchronization". Same thing can be saied for other
> attributes.
>
> 02.
> ```
> +        Reason of the last slot sync skip.
> ```
>
> Possible values must be clarified.
>
> 03.
> ```
> +            s.slot_sync_skip_count,
> +            s.last_slot_sync_skip,
> +                       s.slot_sync_skip_reason,
> ```
>
> Last line has tab-blank but others have space-blank.
>
> 04.
> ```
> +typedef enum SlotSyncSkipReason
> +{
> +       SLOT_SYNC_SKIP_NONE,            /* No skip */
> +       SLOT_SYNC_SKIP_STANDBY_BEHIND,  /* Standby is behind the remote slot */
> +       SLOT_SYNC_SKIP_REMOTE_BEHIND,   /* Remote slot is behind the local slot */
> +       SLOT_SYNC_SKIP_NO_CONSISTENT_SNAPSHOT   /* Standby could not reach a
> +                                                                                        * consistent snapshot */
> +}                      SlotSyncSkipReason
> ```
>
> a.
> Can we add comment atop the enum?
>
> b.
> SLOT_SYNC_SKIP_STANDBY_BEHIND is misleading; it indicates the standby server has
> not received WALs required by slots, but enum does not clarify it.
> How about SLOT_SYNC_SKIP_MISSING_WAL_RECORDS or something? Better naming are very
> welcome.
>
> c
> s/reach/build/.
>
> 05.
> ```
> @@ -646,11 +652,13 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
>                                            remote_slot->name,
>                                            LSN_FORMAT_ARGS(latestFlushPtr)));
>
> -               return false;
> +               /* If the slot is not present on the local */
> +               if (!(slot = SearchNamedReplicationSlot(remote_slot->name, true)))
> +                       return false;
>         }
> ```
>
> Looks like if users try to sync slots via SQL interfaces, the statistics cannot
> be updated.
>
> 06. update_slot_sync_skip_stats
> ```
> +       /* Update the slot sync reason */
> +       SpinLockAcquire(&slot->mutex);
> +       slot->slot_sync_skip_reason = skip_reason;
> +       SpinLockRelease(&slot->mutex);
> ```
>
> I feel no need to update the reason if the slot->slot_sync_skip_reason
> and skip_reason are the same.
>
> 07. synchronize_one_slot
> ```
> +               /*
> +                * If standby is behind remote slot and the synced slot is present on
> +                * local.
> +                */
> +               if (remote_slot->confirmed_lsn > latestFlushPtr)
> +               {
> +                       if (synced)
> +                               update_slot_sync_skip_stats(slot, skip_reason);
> +                       return false;
> +               }
> ```
>
> This condition exist in the same function; can we combine?
>
> 08. GetSlotSyncSkipReason()
>
> Do we have to do pstrdup() here? I found a similar function get_snapbuild_state_desc(),
> and it does not use.
>
I check similar functions and I think we can remove pstrup. Removed in
the latest patch.

> 09.
>
> Can you consider a test for added codes?
Added test in 0002 patch

I have also addressed remaining comments and attached the latest
version of patch.

Thanks,
Shlok Kyal

Attachment

pgsql-hackers by date:

Previous
From: Michael Banck
Date:
Subject: Re: GNU/Hurd portability patches
Next
From: Shlok Kyal
Date:
Subject: Re: How can end users know the cause of LR slot sync delays?