Re: How can end users know the cause of LR slot sync delays? - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: How can end users know the cause of LR slot sync delays?
Date
Msg-id CAA4eK1KcbdXufQD=p3VA=BsbF3x6UhD4o8qHCXhF4jY6A4NnDw@mail.gmail.com
Whole thread Raw
In response to Re: How can end users know the cause of LR slot sync delays?  (Shlok Kyal <shlok.kyal.oss@gmail.com>)
List pgsql-hackers
On Fri, Nov 21, 2025 at 6:21 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
>
> The Cbot complained that it was not able to build the docs. I have
> fixed it and attached the latest patch.
>

Few comments on 0001:
1.
+      <entry role="catalog_table_entry"><para role="column_definition">
+        <structfield>slotsync_last_skip_at</structfield><type>timestamp
with time zone</type>
+       </para>
+       <para>
+        Time at which last slot synchronization was skipped.

How important it is to use last in the above field? Isn't
slotsync_skip_at sufficient, as the description says all?

2.
+ /* If slot is present on the local, update the slot sync skip stats */
+ if ((slot = SearchNamedReplicationSlot(remote_slot->name, true)))
+ {
+ bool synced;
+
+ SpinLockAcquire(&slot->mutex);
+ synced = slot->data.synced;
+ SpinLockRelease(&slot->mutex);
+
+ if (synced)
+ {
+ ReplicationSlotAcquire(NameStr(slot->data.name), true, false);
+
+ pgstat_report_replslotsync_skip(slot);
+
+ ReplicationSlotRelease();
+ }
+ }

I think acquiring the slot just for reporting the stats sounds too
much. How about instead moving the check if
(remote_slot->confirmed_lsn > latestFlushPtr) in the later if/else
blocks where we have the slot acquire acquired?

In if block, we can move it just before below check
if (slot->data.persistency == RS_TEMPORARY)
{
slot_updated = update_and_persist_local_synced_slot(remote_slot,

In the else block, just before following
update_and_persist_local_synced_slot(remote_slot, remote_dbid);

3.
We can only get here if pgstat_create_replslot() or
+ * pgstat_acquire_replslot() have already been called.
+ */
+void
+pgstat_report_replslotsync_skip(ReplicationSlot *slot)

3A. Instead of having such a comment, it is better to have elog(ERROR,
.. or Assertion if the stats entry doesn't exist by this time.
3B. Also, let's name the function as pgstat_report_replslotsync().

4. I think the current test used in the patch is very complex as it
requires multiple server restarts. Instead, we can use a test similar
to what is being used in the patch proposed in the thread [1].

[1] - https://www.postgresql.org/message-id/CAFPTHDYgcvZ60eHWUav3VQSeVibivx7A31rp_pFAkMQrW%3Dj%3D5A%40mail.gmail.com

--
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Add pg_buffercache_mark_dirty[_all] functions to the pg_buffercache
Next
From: shveta malik
Date:
Subject: Re: How can end users know the cause of LR slot sync delays?