RE: Synchronizing slots from primary to standby - Mailing list pgsql-hackers

From Zhijie Hou (Fujitsu)
Subject RE: Synchronizing slots from primary to standby
Date
Msg-id OS0PR01MB57165EA076D41985572E58BA94462@OS0PR01MB5716.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: Synchronizing slots from primary to standby  (Bertrand Drouvot <bertranddrouvot.pg@gmail.com>)
Responses Re: Synchronizing slots from primary to standby
List pgsql-hackers
On Friday, February 2, 2024 2:03 PM Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote:
>
> Hi,
>
> On Thu, Feb 01, 2024 at 05:29:15PM +0530, shveta malik wrote:
> > Attached v75 patch-set. Changes are:
> >
> > 1) Re-arranged the patches:
> > 1.1) 'libpqrc' related changes (from v74-001 and v74-004) are
> > separated out in v75-001 as those are independent changes.
> > 1.2) 'Add logical slot sync capability', 'Slot sync worker as special
> > process' and 'App-name changes' are now merged to single patch which
> > makes v75-002.
> > 1.3) 'Wait for physical Standby confirmation' and 'Failover Validation
> > Document' patches are maintained as is (v75-003 and v75-004 now).
>
> Thanks!
>
> I only looked at the commit message for v75-0002 and see that it has changed
> since the comment done in [1], but it still does not look correct to me.
>
> "
> If a logical slot on the primary is valid but is invalidated on the standby, then
> that slot is dropped and recreated on the standby in next sync-cycle provided
> the slot still exists on the primary server. It is okay to recreate such slots as long
> as these are not consumable on the standby (which is the case currently). This
> situation may occur due to the following reasons:
> - The max_slot_wal_keep_size on the standby is insufficient to retain WAL
>   records from the restart_lsn of the slot.
> - primary_slot_name is temporarily reset to null and the physical slot is
>   removed.
> - The primary changes wal_level to a level lower than logical.
> "
>
> If a logical decoding slot "still exists on the primary server" then the primary
> can not change the wal_level to lower than logical, one would get something
> like:
>
> "FATAL:  logical replication slot "logical_slot" exists, but wal_level < logical"
>
> and then slots won't get invalidated on the standby. I've the feeling that the
> wal_level conflict part may need to be explained separately? (I think it's not
> possible that they end up being re-created on the standby for this conflict,
> they will be simply removed as it would mean the counterpart one on the
> primary does not exist anymore).

This is possible in some extreme cases, because the slot is synced
asynchronously.

For example: If on the primary the wal_level is changed to 'replica' and then
changed back to 'logical', the standby would receive two XLOG_PARAMETER_CHANGE
wals. And before the standby replay these wals, user can create a failover slot
on the primary because the wal_level is logical, and if the slotsync worker has
synced the slots before startup process replay the XLOG_PARAMETER_CHANGE, then
when replaying the XLOG_PARAMETER_CHANGE, the just synced slot will be
invalidated.

Although I think it doesn't seem a real world case, so I am not sure is it worth
separate explanation.

Best Regards,
Hou zj



pgsql-hackers by date:

Previous
From: jian he
Date:
Subject: recently added jsonpath method change jsonb_path_query, jsonb_path_query_first immutability
Next
From: "Tristan Partin"
Date:
Subject: Re: Fix some ubsan/asan related issues