Re: How can end users know the cause of LR slot sync delays? - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: How can end users know the cause of LR slot sync delays?
Date
Msg-id CAA4eK1+qo3FgE95xPeEMG6Kt6U6TLJUm9hXnT9MXbi7Xk7OPcA@mail.gmail.com
Whole thread Raw
In response to Re: How can end users know the cause of LR slot sync delays?  (Ashutosh Sharma <ashu.coek88@gmail.com>)
List pgsql-hackers
On Fri, Sep 5, 2025 at 12:50 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
>
> Good to hear that you’re also interested in working on this task.
>
> On Thu, Sep 4, 2025 at 8:26 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
>>
>> Hi Ashutosh,
>>
>> I am also interested in this thread. And was working on a patch for it.
>>
>> On Wed, 3 Sept 2025 at 17:52, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
>> >
>> > Hi Amit,
>> >
>> > On Thu, Aug 28, 2025 at 3:26 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>> >>
>> >> On Thu, Aug 28, 2025 at 11:07 AM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
>> >> >
>> >> > We have seen cases where slot synchronization gets delayed, for example when the slot is behind the failover
standbyor vice versa, and the slot sync worker has to wait for one to catch up with the other. During this waiting
period,users querying pg_replication_slots can only see whether the slot has been synchronized or not. If it has
alreadysynchronized, that’s fine, but if synchronization is taking longer, users would naturally want to understand the
reasonfor the delay. 
>> >> >
>> >> > Is there a way for end users to know the cause of slot synchronization delays, so they can take appropriate
actionsto speed it up? 
>> >> >
>> >> > I understand that server logs are emitted in such cases, but logs are not something end users would want to
checkregularly. Moreover, since logging is configuration-based, relevant messages may sometimes be skipped or
suppressed.
>> >> >
>> >>
>> >> Currently, the way to see the reason for sync skip is LOGs but I think
>> >> it is better to add a new column like sync_skip_reason in
>> >> pg_replication_slots. This can show the reasons like
>> >> standby_LSN_ahead_remote_LSN. I think ideally users can compare
>> >> standby's slot LSN/XMIN with remote_slot being synced. Do you have any
>> >> better ideas?
>> >>
>> >
>> > I have similar thoughts, but for clarity, I’d like to outline some of the key steps I plan to take:
>> >
>> > Step 1: Define an enum for all possible reasons a slot persistence was skipped.
>> >
>> > /*
>> >  * Reasons why a replication slot sync was skipped.
>> >  */
>> > typedef enum ReplicationSlotSyncSkipReason
>> > {
>> >     RS_SYNC_SKIP_NONE = 0,                 /* No skip */
>> >
>> >     RS_SYNC_SKIP_REMOTE_BEHIND = (1 << 0), /* Remote slot is behind local reserved LSN */
>> >
>> >     RS_SYNC_SKIP_DATA_LOSS = (1 << 1),     /* Local slot ahead of remote, risk of data loss */
>> >
>> >     RS_SYNC_SKIP_NO_SNAPSHOT = (1 << 2)    /* Standby could not build a consistent snapshot */
>> > } ReplicationSlotSyncSkipReason;
>> >
>> > --
>> >
>> I think we should also add the case when "remote_slot->confirmed_lsn >
>> latestFlushPtr" (WAL corresponding to the confirmed lsn on remote slot
>> is still not flushed on the Standby). In this case as well we are
>> skipping the slot sync.
>
>
> Yes, we can include this case as well.
>
>>
>>
>> > Step 2: Introduce new column to pg_replication_slots to store the skip reason
>> >
>> > /* Inside pg_replication_slots table */
>> > ReplicationSlotSyncSkipReason slot_sync_skip_reason;
>> >
>> > --
>> >
>> As per the discussion [1], I think it is more of stat related data and
>> we should add it in the pg_stat_replication_slots view. Also we can
>> add columns for 'slot sync skip count' and 'last slot sync skip'.
>> Thoughts?
>
>
> It’s not a bad choice, but what makes it a bit confusing for me is that some of the slot sync information is stored
inpg_replication_slots, while some is in pg_stat_replication_slots. 
>

How about keeping sync_skip_reason in pg_replication_slots and
sync_skip_count in pg_stat_replication_slots?

--
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: [Patch] add new parameter to pg_replication_origin_session_setup
Next
From: Philip Warner
Date:
Subject: Re: Appetite for syntactic sugar to match result set columns to UDT fields?