RE: Synchronizing slots from primary to standby - Mailing list pgsql-hackers
From | Zhijie Hou (Fujitsu) |
---|---|
Subject | RE: Synchronizing slots from primary to standby |
Date | |
Msg-id | OS0PR01MB5716B382E3DA4B79C937BC90945C2@OS0PR01MB5716.jpnprd01.prod.outlook.com Whole thread Raw |
In response to | Re: Synchronizing slots from primary to standby (Peter Smith <smithpb2250@gmail.com>) |
Responses |
Re: Synchronizing slots from primary to standby
|
List | pgsql-hackers |
On Sunday, March 3, 2024 7:47 AM Peter Smith <smithpb2250@gmail.com> wrote: > > On Sat, Mar 2, 2024 at 2:51 PM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> > wrote: > > > > On Friday, March 1, 2024 12:23 PM Peter Smith <smithpb2250@gmail.com> > wrote: > > > > ... > > > ====== > > > src/backend/replication/slot.c > > > > > > 2. validate_standby_slots > > > > > > + else if (!ReplicationSlotCtl) > > > + { > > > + /* > > > + * We cannot validate the replication slot if the replication slots' > > > + * data has not been initialized. This is ok as we will validate > > > + the > > > + * specified slot when waiting for them to catch up. See > > > + * StandbySlotsHaveCaughtup for details. > > > + */ > > > + } > > > + else > > > + { > > > + /* > > > + * If the replication slots' data have been initialized, verify if > > > + the > > > + * specified slots exist and are logical slots. > > > + */ > > > + LWLockAcquire(ReplicationSlotControlLock, LW_SHARED); > > > > > > IMO that 2nd comment does not need to say "If the replication slots' > > > data have been initialized," because that is implicit from the if/else. > > > > Removed. > > I only meant to suggest removing the 1st part, not the entire comment. > I thought it is still useful to have a comment like: > > /* Check that the specified slots exist and are logical slots.*/ OK, I misunderstood. Fixed. > > ====== > > And, here are some review comments for v103-0001. Thanks for the comments. > > ====== > Commit message > > 1. > Additionally, The SQL functions pg_logical_slot_get_changes, > pg_logical_slot_peek_changes and pg_replication_slot_advance are modified > to wait for the replication slots mentioned in 'standby_slot_names' to catch up > before returning. > > ~ > > (use the same wording as previous in this message) > > /mentioned in/specified in/ Changed. > > ====== > doc/src/sgml/config.sgml > > 2. > + Additionally, when using the replication management functions > + <link linkend="pg-replication-slot-advance"> > + <function>pg_replication_slot_advance</function></link>, > + <link linkend="pg-logical-slot-get-changes"> > + <function>pg_logical_slot_get_changes</function></link>, and > + <link linkend="pg-logical-slot-peek-changes"> > + <function>pg_logical_slot_peek_changes</function></link>, > + with failover enabled logical slots, the functions will wait for the > + physical slots specified in > <varname>standby_slot_names</varname> to > + confirm WAL receipt before proceeding. > + </para> > > It says "the ... functions" twice so maybe reword it slightly. > > BEFORE > Additionally, when using the replication management functions > pg_replication_slot_advance, pg_logical_slot_get_changes, and > pg_logical_slot_peek_changes, with failover enabled logical slots, the functions > will wait for the physical slots specified in standby_slot_names to confirm WAL > receipt before proceeding. > > SUGGESTION > Additionally, the replication management functions > pg_replication_slot_advance, pg_logical_slot_get_changes, and > pg_logical_slot_peek_changes, when used with failover enabled logical slots, > will wait for the physical slots specified in standby_slot_names to confirm WAL > receipt before proceeding. > > (Actually the "will wait ... before proceeding" is also a bit tricky, so below is > another possible rewording) > > SUGGESTION #2 > Additionally, the replication management functions > pg_replication_slot_advance, pg_logical_slot_get_changes, and > pg_logical_slot_peek_changes, when used with failover enabled logical slots, > will block until all physical slots specified in standby_slot_names have > confirmed WAL receipt. I prefer the #2 version. > > ~~~ > > 3. > + <note> > + <para> > + Value <literal>*</literal> is not accepted as it is inappropriate to > + block logical replication for physical slots that either lack > + associated standbys or have standbys associated that are not > enabled > + for replication slot synchronization. (see > + <xref > linkend="logicaldecoding-replication-slots-synchronization"/>). > + </para> > + </note> > > Why does the document need to provide an excuse/reason for the rule? > You could just say something like: > > SUGGESTION > The slots must be named explicitly. For example, specifying wildcard values like > <literal>*</literal> is not permitted. As suggested by Amit, I moved this to code comments. > > ====== > doc/src/sgml/func.sgml > > 4. > @@ -28150,7 +28150,7 @@ postgres=# SELECT '0/0'::pg_lsn + > pd.segment_number * ps.setting::int + :offset > </row> > > <row> > - <entry role="func_table_entry"><para role="func_signature"> > + <entry id="pg-logical-slot-get-changes" > role="func_table_entry"><para role="func_signature"> > <indexterm> > <primary>pg_logical_slot_get_changes</primary> > </indexterm> > @@ -28177,7 +28177,7 @@ postgres=# SELECT '0/0'::pg_lsn + > pd.segment_number * ps.setting::int + :offset > </row> > > <row> > - <entry role="func_table_entry"><para role="func_signature"> > + <entry id="pg-logical-slot-peek-changes" > role="func_table_entry"><para role="func_signature"> > <indexterm> > <primary>pg_logical_slot_peek_changes</primary> > </indexterm> > @@ -28232,7 +28232,7 @@ postgres=# SELECT '0/0'::pg_lsn + > pd.segment_number * ps.setting::int + :offset > </row> > > <row> > - <entry role="func_table_entry"><para role="func_signature"> > + <entry id="pg-replication-slot-advance" > role="func_table_entry"><para role="func_signature"> > <indexterm> > <primary>pg_replication_slot_advance</primary> > </indexterm> > > Should these 3 functions say something about how their behaviour is affected > by 'standby_slot_names' and give a link back to the GUC 'standby_slot_names' > docs? I added the info for pg_logical_slot_get_changes() and pg_replication_slot_advance(). The pg_logical_slot_peek_changes() function clarifies that it behaves just like pg_logical_slot_get_changes(), so I didn’t touch it. > > ====== > src/backend/replication/slot.c > > 5. StandbySlotsHaveCaughtup > > + if (!slot) > + { > + /* > + * If the provided slot does not exist, report a message and exit > + * the loop. It is possible for a user to specify a slot name in > + * standby_slot_names that does not exist just before the server > + * startup. The GUC check_hook(validate_standby_slots) cannot > + * validate such a slot during startup as the ReplicationSlotCtl > + * shared memory is not initialized at that time. It is also > + * possible for a user to drop the slot in standby_slot_names > + * afterwards. > + */ > > 5a. > Minor rewording to make this code comment more similar to the next one: > > SUGGESTION > If a slot name provided in standby_slot_names does not exist, report a message > and exit the loop. A user can specify a slot name that does not exist just before > the server startup... Changed. > > ~ > > 5b. > + /* > + * If a logical slot name is provided in standby_slot_names, > + * report a message and exit the loop. Similar to the non-existent > + * case, it is possible for a user to specify a logical slot name > + * in standby_slot_names before the server startup, or drop an > + * existing physical slot and recreate a logical slot with the > + * same name. > + */ > > /it is possible for a user to specify/a user can specify/ Changed. > > ~~~ > > 6. WaitForStandbyConfirmation > > + /* > + * We wait for the slots in the standby_slot_names to catch up, but we > + * use a timeout (1s) so we can also check if the standby_slot_names > + * has been changed. > + */ > > Remove some of the "we". > > SUGGESTION > Wait for the slots in the standby_slot_names to catch up, but use a timeout (1s) > so we can also check if the standby_slot_names has been changed. Changed. > > ====== > src/backend/replication/walsender.c > > 7. NeedToWaitForStandby > > +/* > + * Returns true if not all standbys have caught up to the flushed > +position > + * (flushed_lsn) when failover_slot is true; otherwise, returns false. > + * > + * If returning true, the function sets the appropriate wait event in > + * wait_event; otherwise, wait_event is set to 0. > + */ > > The function comment refers to 'failover_slot' but IMO needs to be worded > differently because 'failover_slot' is not a parameter anymore. Changed. > > ~~~ > > 8. NeedToWaitForWal > > +/* > + * Returns true if we need to wait for WALs to be flushed to disk, or > +if not > + * all standbys have caught up to the flushed position (flushed_lsn) > +when > + * failover_slot is true; otherwise, returns false. > + * > + * If returning true, the function sets the appropriate wait event in > + * wait_event; otherwise, wait_event is set to 0. > + */ > +static bool > +NeedToWaitForWal(XLogRecPtr target_lsn, XLogRecPtr flushed_lsn, > + uint32 *wait_event) > > Same as above -- Now that 'failover_slot' is not a function parameter, I thought > this should be reworded. Changed. > > ~~~ > > 9. NeedToWaitForWal > > + /* > + * Check if the standby slots have caught up to the flushed position. > + It > + * is good to wait up to flushed position and then let it send the > + changes > + * to logical subscribers one by one which are already covered in > + flushed > + * position without needing to wait on every change for standby > + * confirmation. Note that after receiving the shutdown signal, an > + ERROR > + * is reported if any slots are dropped, invalidated, or inactive. This > + * measure is taken to prevent the walsender from waiting indefinitely. > + */ > + if (NeedToWaitForStandby(target_lsn, flushed_lsn, wait_event)) return > + true; > > I felt it was confusing things for this function to also call to the other one -- it > seems an overlapping/muddling of the purpose of these. > I think it will be easier to understand if the calling code just calls to one or both > of these functions as required. Same as Amit, I didn't change this. > > ~~~ > > 10. > > - WalSndWait(wakeEvents, sleeptime, > WAIT_EVENT_WAL_SENDER_WAIT_FOR_WAL); > + WalSndWait(wakeEvents, sleeptime, wait_event); > > Tracing the assignments of the 'wait_event' is a bit tricky... IIUC it should not be > 0 when we got to this point, so maybe it is better to put Assert(wait_event) > before this call? Added. Best Regards, Hou zj
pgsql-hackers by date: