Re: How can end users know the cause of LR slot sync delays? - Mailing list pgsql-hackers

From shveta malik
Subject Re: How can end users know the cause of LR slot sync delays?
Date
Msg-id CAJpy0uCBFGO6n_8g4TWzCOV0u60c5qXWpCE2PKiiW2yq_t7r5g@mail.gmail.com
Whole thread Raw
In response to Re: How can end users know the cause of LR slot sync delays?  (Shlok Kyal <shlok.kyal.oss@gmail.com>)
Responses Re: How can end users know the cause of LR slot sync delays?
List pgsql-hackers
On Tue, Nov 18, 2025 at 4:07 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
>
> On Fri, 14 Nov 2025 at 14:13, Hayato Kuroda (Fujitsu)
> <kuroda.hayato@fujitsu.com> wrote:
> >
> > Dear Shlok,
> >
> > Thanks for updating the patch. Few more comments.
> >
> > > > > > I’m not sure if this has already been discussed; I couldn’t find any
> > > > > > mention of it in the thread. Why don’t we persist
> > > > > > 'slot_sync_skip_reason' (it is outside of
> > > > > > ReplicationSlotPersistentData)? If a slot wasn’t synced during the
> > > > > > last cycle and the server restarts, it would be helpful to know the
> > > > > > reason it wasn’t synced prior to the node restart.
> > > > > >
> > > Actually I did not think in this direction. I think it will be useful
> > > to persist 'slot_sync_skip_reason'. I have made the change for the
> > > same in the latest patch.
> >
> > Hmm, I'm wondering it should be written on the disk. Other attributes on the disk
> > are essential to decode or replicate changes correctly, but sync status is not
> > used for the purpose. Personally considered, slot sync would re-start soon after
> > the reboot so that it is OK to start with empty. How about others?
> >
> > If we want to serialize the info, we should do further tasks:
> >  - update SLOT_VERSION
> >  - make the slot dirty then SaveSlotToPath() when the status is updated.
> >
> I agree with your point. Slot synchronization will restart shortly
> after a reboot, so it seems reasonable to begin with an empty state
> rather than persisting slot_sync_skip_reason.
> For now, I’ve updated the patch so that slot_sync_skip_reason is no
> longer persisted; its initialization is kept outside of
> ReplicationSlotPersistentData. I’d also like to hear what others
> think.
>

Users may even use an API to synchronize the slots rather than
slotsync worker. In that case synchronization won't start immediately
after server-restart. Having said that, overall the statistical data
getting lost on restart doesn't seem right to me. But I would like to
know what others think.


> > ```
> > +static void
> > +update_slot_sync_skip_stats(ReplicationSlot *slot, SlotSyncSkipReason skip_reason)
> > +{
> > +       Assert(MyReplicationSlot);
> > ```
> >
> > I think no need to require *slot as an argument. We can use the variable to shorten
> > like update_local_synced_slot().
> >
> Fixed
>
> > ```
> > # Verify pg_sync_replication_slots is failing
> > ok( $stderr =~ /skipping slot synchronization because the received slot sync/,
> >         'pg_sync_replication_slots failed as expected');
> > ```
> >
> > This may be matter of taste, but can you check whole of log message? Latter part
> > indicates the actual reason.
> >
> The latter part of the message contains LSN values, which are not
> stable across runs. To avoid hard-coding specific LSNs, I matched the
> fixed, non-variable parts of the message while still covering the
> reason for the failure.
>
> > ```
> > # Detach injection point
> > $standby->safe_psql(
> >         'postgres', q{
> >         SELECT injection_points_detach('slot-sync-skip');
> >         SELECT injection_points_wakeup('slot-sync-skip');
> > });
> > ```
> >
> > Not mandatory, but you can quit the background session if you release the
> > injection point.
> Fixed
>
> Apart from the above changes. I have renamed the functions to
> consistently use the term 'slotsync' instead of 'slot_sync'
> update_slot_sync_skip_stats -> update_slotsync_skip_stats
> pgstat_report_replslot_sync_skip -> pgstat_report_replslotsync_skip
>
> I have attached the updated v8 patch with the latest changes.
>
> Thanks,
> Shlok Kyal



pgsql-hackers by date:

Previous
From: Ian Lawrence Barwick
Date:
Subject: [PATCH] psql: tab completion for ALTER ROLE ... IN DATABASE ...
Next
From: Chao Li
Date:
Subject: Re: Extended Statistics set/restore/clear functions.