Re: Synchronizing slots from primary to standby - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Synchronizing slots from primary to standby
Date
Msg-id CAA4eK1LJdmGATWG=xOD1CB9cogukk2cLNBGH8h-n-ZDJuwBdJg@mail.gmail.com
Whole thread Raw
In response to RE: Synchronizing slots from primary to standby  ("Zhijie Hou (Fujitsu)" <houzj.fnst@fujitsu.com>)
Responses RE: Synchronizing slots from primary to standby
List pgsql-hackers
On Mon, Feb 26, 2024 at 7:49 AM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:
>
> Attach the V98 patch set which addressed above comments.
>

Few comments:
=============
1.
 WalSndWaitForWal(XLogRecPtr loc)
 {
  int wakeEvents;
+ bool wait_for_standby = false;
+ uint32 wait_event;
+ List    *standby_slots = NIL;
  static XLogRecPtr RecentFlushPtr = InvalidXLogRecPtr;

+ if (MyReplicationSlot->data.failover && replication_active)
+ standby_slots = GetStandbySlotList(true);
+
  /*
- * Fast path to avoid acquiring the spinlock in case we already know we
- * have enough WAL available. This is particularly interesting if we're
- * far behind.
+ * Check if all the standby servers have confirmed receipt of WAL up to
+ * RecentFlushPtr even when we already know we have enough WAL available.
+ *
+ * Note that we cannot directly return without checking the status of
+ * standby servers because the standby_slot_names may have changed, which
+ * means there could be new standby slots in the list that have not yet
+ * caught up to the RecentFlushPtr.
  */
- if (RecentFlushPtr != InvalidXLogRecPtr &&
- loc <= RecentFlushPtr)
- return RecentFlushPtr;
+ if (!XLogRecPtrIsInvalid(RecentFlushPtr) && loc <= RecentFlushPtr)
+ {
+ FilterStandbySlots(RecentFlushPtr, &standby_slots);

I think even if the slot list is not changed, we will always process
each slot mentioned in standby_slot_names once. Can't we cache the
previous list of slots for we have already waited for? In that case,
we won't even need to copy the list via GetStandbySlotList() unless we
need to wait.

2.
 WalSndWaitForWal(XLogRecPtr loc)
 {
+ /*
+ * Update the standby slots that have not yet caught up to the flushed
+ * position. It is good to wait up to RecentFlushPtr and then let it
+ * send the changes to logical subscribers one by one which are
+ * already covered in RecentFlushPtr without needing to wait on every
+ * change for standby confirmation.
+ */
+ if (wait_for_standby)
+ FilterStandbySlots(RecentFlushPtr, &standby_slots);
+
  /* Update our idea of the currently flushed position. */
- if (!RecoveryInProgress())
+ else if (!RecoveryInProgress())
  RecentFlushPtr = GetFlushRecPtr(NULL);
  else
  RecentFlushPtr = GetXLogReplayRecPtr(NULL);
...
/*
* If postmaster asked us to stop, don't wait anymore.
*
* It's important to do this check after the recomputation of
* RecentFlushPtr, so we can send all remaining data before shutting
* down.
*/
if (got_STOPPING)
break;

I think because 'wait_for_standby' may not be set in the first or
consecutive cycles we may send the WAL to the logical subscriber
before sending it to the physical subscriber during shutdown.

--
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Synchronizing slots from primary to standby
Next
From: torikoshia
Date:
Subject: Re: RFC: Logging plan of the running query