Re: Synchronizing slots from primary to standby - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Synchronizing slots from primary to standby
Date
Msg-id CAA4eK1Ldhh_kf-qG-m5BKY0R1SkdBSx5j+Ezwpie+H9GPWWOYA@mail.gmail.com
Whole thread Raw
In response to Re: Synchronizing slots from primary to standby  (shveta malik <shveta.malik@gmail.com>)
Responses Re: Synchronizing slots from primary to standby
List pgsql-hackers
On Wed, Feb 7, 2024 at 5:32 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> Sure, made the suggested function name changes. Since there is no
> other change, I kept the version as v80_2.
>

Few comments on 0001
===================
1.
+ * the slots on the standby and synchronize them. This is done on every call
+ * to SQL function pg_sync_replication_slots.
>

I think the second sentence can be slightly changed to: "This is done
by a call to SQL function pg_sync_replication_slots." or "One can call
SQL function pg_sync_replication_slots to invoke this functionality."

2.
+update_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
{
...
+ SpinLockAcquire(&slot->mutex);
+ slot->data.plugin = plugin_name;
+ slot->data.database = remote_dbid;
+ slot->data.two_phase = remote_slot->two_phase;
+ slot->data.failover = remote_slot->failover;
+ slot->data.restart_lsn = remote_slot->restart_lsn;
+ slot->data.confirmed_flush = remote_slot->confirmed_lsn;
+ slot->data.catalog_xmin = remote_slot->catalog_xmin;
+ slot->effective_catalog_xmin = remote_slot->catalog_xmin;
+ SpinLockRelease(&slot->mutex);
+
+ if (remote_slot->catalog_xmin != slot->data.catalog_xmin)
+ ReplicationSlotsComputeRequiredXmin(false);
+
+ if (remote_slot->restart_lsn != slot->data.restart_lsn)
+ ReplicationSlotsComputeRequiredLSN();
...
}

How is it possible that after assigning the values from remote_slot
they can differ from local slot values?

3.
+ /*
+ * Find the oldest existing WAL segment file.
+ *
+ * Normally, we can determine it by using the last removed segment
+ * number. However, if no WAL segment files have been removed by a
+ * checkpoint since startup, we need to search for the oldest segment
+ * file currently existing in XLOGDIR.
+ */
+ oldest_segno = XLogGetLastRemovedSegno() + 1;
+
+ if (oldest_segno == 1)
+ oldest_segno = XLogGetOldestSegno(0);

I feel this way isn't there a risk that XLogGetOldestSegno() will get
us the seg number from some previous timeline which won't make sense
to compare segno in reserve_wal_for_local_slot. Shouldn't you need to
fetch the current timeline and send as a parameter to this function as
that is the timeline on which standby is communicating with primary.

4.
+ if (remote_slot->confirmed_lsn > latestFlushPtr)
+ ereport(ERROR,
+ errmsg("skipping slot synchronization as the received slot sync"

I think the internal errors should be reported with elog as you have
done at other palces in the patch.


5.
+synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
{
...
+ /*
+ * Copy the invalidation cause from remote only if local slot is not
+ * invalidated locally, we don't want to overwrite existing one.
+ */
+ if (slot->data.invalidated == RS_INVAL_NONE)
+ {
+ SpinLockAcquire(&slot->mutex);
+ slot->data.invalidated = remote_slot->invalidated;
+ SpinLockRelease(&slot->mutex);
+
+ /* Make sure the invalidated state persists across server restart */
+ ReplicationSlotMarkDirty();
+ ReplicationSlotSave();
+ slot_updated = true;
+ }
...
}

Do we need to copy the 'invalidated' from remote to local if both are
same? I think this will happen for each slot each time because
normally slots won't be invalidated ones, so there is needless writes.

6.
+ * Returns TRUE if any of the slots gets updated in this sync-cycle.
+ */
+static bool
+synchronize_slots(WalReceiverConn *wrconn)
...
...

+void
+SyncReplicationSlots(WalReceiverConn *wrconn)
+{
+ PG_TRY();
+ {
+ validate_primary_slot_name(wrconn);
+
+ (void) synchronize_slots(wrconn);

For the purpose of 0001, synchronize_slots() doesn't seems to use
return value. So, I suggest to change it accordingly and move the
return value in the required patch.

7.
+ /*
+ * The primary_slot_name is not set yet or WALs not received yet.
+ * Synchronization is not possible if the walreceiver is not started.
+ */
+ latestWalEnd = GetWalRcvLatestWalEnd();
+ SpinLockAcquire(&WalRcv->mutex);
+ if ((WalRcv->slotname[0] == '\0') ||
+ XLogRecPtrIsInvalid(latestWalEnd))
+ {
+ SpinLockRelease(&WalRcv->mutex);
+ return false;

For the purpose of 0001, we should give WARNING here.

--
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: "Jingxian Li"
Date:
Subject: Re: [PATCH] LockAcquireExtended improvement
Next
From: Peter Eisentraut
Date:
Subject: Re: Make documentation builds reproducible