RE: Synchronizing slots from primary to standby - Mailing list pgsql-hackers

From Zhijie Hou (Fujitsu)
Subject RE: Synchronizing slots from primary to standby
Date
Msg-id OS0PR01MB5716CD1AB0090155B2ED5C3F945C2@OS0PR01MB5716.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: Synchronizing slots from primary to standby  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: Synchronizing slots from primary to standby
List pgsql-hackers
On Saturday, March 2, 2024 6:55 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> 
> On Sat, Mar 2, 2024 at 9:21 AM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com>
> wrote:
> >
> > Apart from the comments, the code in WalSndWaitForWal was refactored a
> > bit to make it neater. Thanks Shveta for helping writing the code and doc.
> >
> 
> A few more comments:

Thanks for the comments.

> ==================
> 1.
> +# Wait until the primary server logs a warning indicating that it is
> +waiting # for the sb1_slot to catch up.
> +$primary->wait_for_log(
> + qr/replication slot \"sb1_slot\" specified in parameter
> standby_slot_names does not have active_pid/,
> + $offset);
> 
> Shouldn't we wait for such a LOG even in the first test as well which involves two
> standbys and two logical subscribers?

Yes, we should. Added.

> 
> 2.
> +##################################################
> +# Test that logical replication will wait for the user-created inactive
> +# physical slot to catch up until we remove the slot from standby_slot_names.
> +##################################################
> 
> 
> I don't see anything different tested in this test from what we already tested in
> the first test involving two standbys and two logical subscribers. Can you
> please clarify if I am missing something?

I think the intention was to test that the wait loop is ended due to GUC config
reload, while the first test is for the case when the loop is ended due to
restart_lsn movement. But it seems we tested the config reload with xx_get_changes() as
well, so I can remove it if you agree.

> 
> 3.
> Note that after receiving the shutdown signal, an ERROR
> + * is reported if any slots are dropped, invalidated, or inactive. This
> + * measure is taken to prevent the walsender from waiting indefinitely.
> + */
> + if (NeedToWaitForStandby(target_lsn, flushed_lsn, wait_event))
> 
> Isn't this part of the comment should be moved inside
> NeedToWaitForStandby()?

Moved.

> 
> 4.
> + /*
> + * Update our idea of the currently flushed position only if we are
> + * not waiting for standbys to catch up, otherwise the standby would
> + * have to catch up to a newer WAL location in each cycle.
> + */
> + if (wait_event != WAIT_EVENT_WAIT_FOR_STANDBY_CONFIRMATION)
> + {
> 
> This functionality (in function WalSndWaitForWal()) seems to ensure that we
> first wait for the required WAL to be flushed and then wait for standbys. If true,
> we should cover that point in the comments here or somewhere in the function
> WalSndWaitForWal().
> 
> Apart from this, I have made a few modifications in the comments.

Thanks. I have reviewed and merged them.

Here is the V104 patch which addressed above and Peter's comments.

Best Regards,
Hou zj

Attachment

pgsql-hackers by date:

Previous
From: "Zhijie Hou (Fujitsu)"
Date:
Subject: RE: Synchronizing slots from primary to standby
Next
From: Alena Rybakina
Date:
Subject: Re: POC, WIP: OR-clause support for indexes