RE: Synchronizing slots from primary to standby - Mailing list pgsql-hackers
From | Zhijie Hou (Fujitsu) |
---|---|
Subject | RE: Synchronizing slots from primary to standby |
Date | |
Msg-id | OS0PR01MB57162551460DBAE9269FEF86944F2@OS0PR01MB5716.jpnprd01.prod.outlook.com Whole thread Raw |
In response to | Re: Synchronizing slots from primary to standby (Bertrand Drouvot <bertranddrouvot.pg@gmail.com>) |
List | pgsql-hackers |
On Monday, February 12, 2024 6:03 PM Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote: > > Hi, > > On Sun, Feb 11, 2024 at 01:23:19PM +0000, Zhijie Hou (Fujitsu) wrote: > > On Saturday, February 10, 2024 9:10 PM Amit Kapila > <amit.kapila16@gmail.com> wrote: > > > > > > On Sat, Feb 10, 2024 at 5:31 PM Masahiko Sawada > > > <sawada.mshk@gmail.com> > > > wrote: > > > > > > > > On Fri, Feb 9, 2024 at 4:08 PM Zhijie Hou (Fujitsu) > > > > <houzj.fnst@fujitsu.com> wrote: > > > > > > > > > Another alternative is to register the callback when calling > > > > > slotsync functions and unregister it after the function call. > > > > > And register the callback in > > > > > slotsyncworkmain() for the slotsync worker patch, although this > > > > > may adds a few more codes. > > > > > > > > Another idea is that SyncReplicationSlots() calls > > > > synchronize_slots() in PG_ENSURE_ERROR_CLEANUP() block instead of > > > > PG_TRY(), to make sure to clear the flag in case of ERROR or > > > > FATAL. And the slotsync worker uses the before_shmem_callback to clear > the flag. > > > > > > > > > > +1. This sounds like a better way to clear the flag. > > > > Agreed. Here is the V84 patch which addressed this. > > > > Apart from above, I removed the txn start/end codes from 0001 as they > > are used in the slotsync worker patch. And I also ran pgindent and > > pgperltidy for the patch. > > > > Thanks! > > A few random comments: Thanks for the comments. > > 001 === > > " > For > the synchronization to work, it is mandatory to have a physical replication slot > between the primary and the standby, " > > Maybe mention "primary_slot_name" here? Added. > > 002 === > > + <para> > + Synchronize the logical failover slots from the primary server to the > standby server. > > should we say "logical failover replication slots" instead? Changed. > > 003 === > > + If, after executing the function, > + <link linkend="guc-hot-standby-feedback"> > + <varname>hot_standby_feedback</varname></link> is disabled > on > + the standby or the physical slot configured in > + <link linkend="guc-primary-slot-name"> > + <varname>primary_slot_name</varname></link> is > + removed, > > I think another option that could lead to slot invalidation is if primary_slot_name > is NULL or miss-configured. Indeed hot_standby_feedback would be working > (for the catalog_xmin) but only as long as the standby is up and running. I didn't change this based on the discussion. > > 004 === > > + on the standby. For the synchronization to work, it is mandatory to > + have a physical replication slot between the primary and the > + standby, > > should we mention primary_slot_name here? Added. > > 005 === > > + To resume logical replication after failover from the synced logical > + slots, the subscription's 'conninfo' must be altered > > Only in a pub/sub context but not for other ways of using the logical replication > slot(s). I am not very sure about this, because the 3-rd part logicalrep can also have their own replication origin, so I didn't change for now, but will think over this. > > 006 === > > + neither be used for logical decoding nor dropped by the user > > what about "nor dropped manually"? Changed. > > 007 === > > +typedef struct SlotSyncCtxStruct > +{ > > Should we remove "Struct" from the struct name? The name was named based on some other comment to be consistent with LogicalReplCtxStruct, so I didn't change this. If other also prefer without struct, we can change it later. > 008 === > > + ereport(LOG, > + errmsg("dropped replication slot > \"%s\" of dbid %d", > + > NameStr(local_slot->data.name), > + > + local_slot->data.database)); > > We emit a message when an "invalidated" slot is dropped but not when we > create a slot. Shouldn't we emit a message when we create a synced slot on the > standby? > > I think that could be confusing to see "a drop" message not followed by "a > create" > one when it's expected (slot valid on the primary for example). I think we will report "sync-ready" for newly synced slot, for newly created temporary slots, I am not sure do we need to report log to them, because they will be dropped on promotion anyway. But if others also prefer to log, I am fine with that. > > 009 === > > Regarding 040_standby_failover_slots_sync.pl what about adding tests for? > > - synced slot invalidation (and ensure it's recreated once > pg_sync_replication_slots() is called and when the slot in primary is valid) Will try this in next version. > - cannot enable failover for a temporary replication slot Added. > - replication slots can only be synchronized from a standby server Added. Best Regards, Hou zj
pgsql-hackers by date: