Re: How can end users know the cause of LR slot sync delays? - Mailing list pgsql-hackers
From | Shlok Kyal |
---|---|
Subject | Re: How can end users know the cause of LR slot sync delays? |
Date | |
Msg-id | CANhcyEVFZN2Mkjs0QHshKm2_3AkQ0eufjkD12eL2MeuVkPyGbw@mail.gmail.com Whole thread Raw |
In response to | RE: How can end users know the cause of LR slot sync delays? ("Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com>) |
List | pgsql-hackers |
On Wed, 17 Sept 2025 at 16:24, Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com> wrote: > > Dear Shlok, > > Thanks for creating the patch. Personally I prefer approach2; approach1 cannot > indicate the current status of synchronization, it just shows the history. > I feel approach2 has more information than approach1. > I agree with your point. I think the preferred behaviour of slot_sync_skip_reason is to indicate the current status of synchronization. And taking account the current behaviour of columns of views pg_replication_slots and pg_stat_replication_slots, I think slot_sync_skip_reason is suited for pg_replication_slots view and I am proceeding ahead with approach2 > Below are my comments for your patch. > > 01. > ``` > + Number of times the slot sync is skipped. > ``` > > "slot sync" should be "slot synchronization". Same thing can be saied for other > attributes. > > 02. > ``` > + Reason of the last slot sync skip. > ``` > > Possible values must be clarified. > > 03. > ``` > + s.slot_sync_skip_count, > + s.last_slot_sync_skip, > + s.slot_sync_skip_reason, > ``` > > Last line has tab-blank but others have space-blank. > > 04. > ``` > +typedef enum SlotSyncSkipReason > +{ > + SLOT_SYNC_SKIP_NONE, /* No skip */ > + SLOT_SYNC_SKIP_STANDBY_BEHIND, /* Standby is behind the remote slot */ > + SLOT_SYNC_SKIP_REMOTE_BEHIND, /* Remote slot is behind the local slot */ > + SLOT_SYNC_SKIP_NO_CONSISTENT_SNAPSHOT /* Standby could not reach a > + * consistent snapshot */ > +} SlotSyncSkipReason > ``` > > a. > Can we add comment atop the enum? > > b. > SLOT_SYNC_SKIP_STANDBY_BEHIND is misleading; it indicates the standby server has > not received WALs required by slots, but enum does not clarify it. > How about SLOT_SYNC_SKIP_MISSING_WAL_RECORDS or something? Better naming are very > welcome. > > c > s/reach/build/. > > 05. > ``` > @@ -646,11 +652,13 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid) > remote_slot->name, > LSN_FORMAT_ARGS(latestFlushPtr))); > > - return false; > + /* If the slot is not present on the local */ > + if (!(slot = SearchNamedReplicationSlot(remote_slot->name, true))) > + return false; > } > ``` > > Looks like if users try to sync slots via SQL interfaces, the statistics cannot > be updated. > > 06. update_slot_sync_skip_stats > ``` > + /* Update the slot sync reason */ > + SpinLockAcquire(&slot->mutex); > + slot->slot_sync_skip_reason = skip_reason; > + SpinLockRelease(&slot->mutex); > ``` > > I feel no need to update the reason if the slot->slot_sync_skip_reason > and skip_reason are the same. > > 07. synchronize_one_slot > ``` > + /* > + * If standby is behind remote slot and the synced slot is present on > + * local. > + */ > + if (remote_slot->confirmed_lsn > latestFlushPtr) > + { > + if (synced) > + update_slot_sync_skip_stats(slot, skip_reason); > + return false; > + } > ``` > > This condition exist in the same function; can we combine? > > 08. GetSlotSyncSkipReason() > > Do we have to do pstrdup() here? I found a similar function get_snapbuild_state_desc(), > and it does not use. > I check similar functions and I think we can remove pstrup. Removed in the latest patch. > 09. > > Can you consider a test for added codes? Added test in 0002 patch I have also addressed remaining comments and attached the latest version of patch. Thanks, Shlok Kyal
Attachment
pgsql-hackers by date: