Re: Synchronizing slots from primary to standby - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Synchronizing slots from primary to standby
Date
Msg-id CAA4eK1Ko-EBBDkea2R8V8PeveGg10PBswCF7JQdnRu+MJP+YBQ@mail.gmail.com
Whole thread Raw
In response to Re: Synchronizing slots from primary to standby  (shveta malik <shveta.malik@gmail.com>)
Responses Re: Synchronizing slots from primary to standby
List pgsql-hackers
On Fri, Dec 15, 2023 at 11:03 AM shveta malik <shveta.malik@gmail.com> wrote:
>
> Sorry, I missed attaching the patch. PFA v48.
>

Few comments on v48_0002
========================
1.
+static void
+slotsync_reread_config(WalReceiverConn *wrconn)
{
...
+ pfree(old_primary_conninfo);
+ pfree(old_primary_slotname);
+
+ if (restart)
+ {
+ char    *msg = "slot sync worker will restart because of a parameter change";
+
+ /*
+ * The exit code 1 will make postmaster restart the slot sync worker.
+ */
+ slotsync_worker_exit(msg, 1 /* proc_exit code */ );
+ }
...

I don't see the need to explicitly pfree in case we are already
exiting the process because anyway the memory will be released. We can
avoid using the 'restart' variable for this. Also, probably, directly
exiting here makes sense and at another place where this function is
used. I see that in maybe_reread_subscription(), we exit with a 0 code
and still apply worker restarts, so why use a different exit code
here?

2.
+static void
+check_primary_info(WalReceiverConn *wrconn, bool *am_cascading_standby)
{
...
+ remote_in_recovery = DatumGetBool(slot_getattr(tupslot, 1, &isnull));
+ Assert(!isnull);
+
+ /* No need to check further, return that we are cascading standby */
+ if (remote_in_recovery)
+ {
+ *am_cascading_standby = true;
+ CommitTransactionCommand();
+ return;
...
}

Don't we need to clear the result and tuple in case of early return?

3. It would be a good idea to mention about requirements like a
physical slot on primary, hot_standby_feedback, etc. in the commit
message.

4.
+static bool
+wait_for_primary_slot_catchup(WalReceiverConn *wrconn, RemoteSlot *remote_slot,
+   bool *wait_attempts_exceeded)
{
...
+ tupslot = MakeSingleTupleTableSlot(res->tupledesc, &TTSOpsMinimalTuple);
+ if (!tuplestore_gettupleslot(res->tuplestore, true, false, tupslot))
+ {
+ ereport(WARNING,
+ (errmsg("slot \"%s\" creation aborted", remote_slot->name),
+ errdetail("This slot was not found on the primary server")));
...
+ /*
+ * It is possible to get null values for LSN and Xmin if slot is
+ * invalidated on the primary server, so handle accordingly.
+ */
+ new_invalidated = DatumGetBool(slot_getattr(tupslot, 1, &isnull));
+ Assert(!isnull);
+
+ new_restart_lsn = DatumGetLSN(slot_getattr(tupslot, 2, &isnull));
+ if (new_invalidated || isnull)
+ {
+ ereport(WARNING,
+ (errmsg("slot \"%s\" creation aborted", remote_slot->name),
+ errdetail("This slot was invalidated on the primary server")));
...
}

a. The errdetail message should end with a full stop. Please check all
other errdetail messages in the patch to follow the same guideline.
b. I think saying slot creation aborted is not completely correct
because we would have created the slot especially when it is in 'i'
state. Can we change it to something like: "aborting initial sync for
slot \"%s\""?
c. Also, if the remote_slot is invalidated, ideally, we can even drop
the local slot but it seems that the patch will drop the same before
the next-sync cycle with any other slot that needs to be dropped. If
so, can we add the comment to indicate the same?

5.
+static void
+local_slot_update(RemoteSlot *remote_slot)
+{
+ Assert(MyReplicationSlot->data.invalidated == RS_INVAL_NONE);
+
+ LogicalConfirmReceivedLocation(remote_slot->confirmed_lsn);
+ LogicalIncreaseXminForSlot(remote_slot->confirmed_lsn,
+    remote_slot->catalog_xmin);
+ LogicalIncreaseRestartDecodingForSlot(remote_slot->confirmed_lsn,
+   remote_slot->restart_lsn);
+
+ SpinLockAcquire(&MyReplicationSlot->mutex);
+ MyReplicationSlot->data.invalidated = remote_slot->invalidated;
+ SpinLockRelease(&MyReplicationSlot->mutex);
...
...

If required, the invalidated flag is updated in the caller as well, so
why do we need to update it here as well?

--
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: vignesh C
Date:
Subject: Re: Remove MSVC scripts from the tree
Next
From: "Euler Taveira"
Date:
Subject: Re: Add a perl function in Cluster.pm to generate WAL