Re: Synchronizing slots from primary to standby - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Synchronizing slots from primary to standby
Date
Msg-id CAA4eK1KC9tb0AUKKnkqyuXXc-B89KQDk5NEWxhm15-dPG2pt3g@mail.gmail.com
Whole thread Raw
In response to Re: Synchronizing slots from primary to standby  (Amit Kapila <amit.kapila16@gmail.com>)
Responses RE: Synchronizing slots from primary to standby
List pgsql-hackers
On Thu, Feb 29, 2024 at 3:23 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> Few additional comments on the latest patch:
> =================================
> 1.
>  static XLogRecPtr
>  WalSndWaitForWal(XLogRecPtr loc)
> {
> ...
> + if (!XLogRecPtrIsInvalid(RecentFlushPtr) && loc <= RecentFlushPtr &&
> + (!replication_active || StandbyConfirmedFlush(loc, WARNING)))
> + {
> + /*
> + * Fast path to avoid acquiring the spinlock in case we already know
> + * we have enough WAL available and all the standby servers have
> + * confirmed receipt of WAL up to RecentFlushPtr. This is particularly
> + * interesting if we're far behind.
> + */
>   return RecentFlushPtr;
> + }
> ...
> ...
> + * Wait for WALs to be flushed to disk only if the postmaster has not
> + * asked us to stop.
> + */
> + if (loc > RecentFlushPtr && !got_STOPPING)
> + wait_event = WAIT_EVENT_WAL_SENDER_WAIT_FOR_WAL;
> +
> + /*
> + * Check if the standby slots have caught up to the flushed position.
> + * It is good to wait up to RecentFlushPtr and then let it send the
> + * changes to logical subscribers one by one which are already covered
> + * in RecentFlushPtr without needing to wait on every change for
> + * standby confirmation. Note that after receiving the shutdown signal,
> + * an ERROR is reported if any slots are dropped, invalidated, or
> + * inactive. This measure is taken to prevent the walsender from
> + * waiting indefinitely.
> + */
> + else if (replication_active &&
> + !StandbyConfirmedFlush(RecentFlushPtr, WARNING))
> + {
> + wait_event = WAIT_EVENT_WAIT_FOR_STANDBY_CONFIRMATION;
> + wait_for_standby = true;
> + }
> + else
> + {
> + /* Already caught up and doesn't need to wait for standby_slots. */
>   break;
> + }
> ...
> }
>
> Can we try to move these checks into a separate function that returns
> a boolean and has an out parameter as wait_event?
>
> 2. How about naming StandbyConfirmedFlush() as StandbySlotsAreCaughtup?
>

Some more comments:
==================
1.
+ foreach_ptr(char, name, elemlist)
+ {
+ ReplicationSlot *slot;
+
+ slot = SearchNamedReplicationSlot(name, true);
+
+ if (!slot)
+ {
+ GUC_check_errdetail("replication slot \"%s\" does not exist",
+ name);
+ ok = false;
+ break;
+ }
+
+ if (!SlotIsPhysical(slot))
+ {
+ GUC_check_errdetail("\"%s\" is not a physical replication slot",
+ name);
+ ok = false;
+ break;
+ }
+ }

Won't the second check need protection via ReplicationSlotControlLock?

2. In WaitForStandbyConfirmation(), we are anyway calling
StandbyConfirmedFlush() before the actual sleep in the loop, so does
calling it at the beginning of the function will serve any purpose? If
so, it is better to add some comments explaining the same.

3. Also do we need to perform the validation checks done in
StandbyConfirmedFlush() repeatedly when it is invoked in a loop? We
can probably separate those checks and perform them just once.

4.
+   *
+   * XXX: If needed, this can be attempted in future.

Remove this part of the comment.

--
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock
Next
From: shveta malik
Date:
Subject: Re: Synchronizing slots from primary to standby