Re: Synchronizing slots from primary to standby - Mailing list pgsql-hackers
From | Peter Smith |
---|---|
Subject | Re: Synchronizing slots from primary to standby |
Date | |
Msg-id | CAHut+PvtysbVd8tj2AADk=eNo0VY9Ov9wkBP-K+9tj1wRS4M4w@mail.gmail.com Whole thread Raw |
In response to | RE: Synchronizing slots from primary to standby ("Zhijie Hou (Fujitsu)" <houzj.fnst@fujitsu.com>) |
Responses |
Re: Synchronizing slots from primary to standby
|
List | pgsql-hackers |
Here are some review comments for v79-0001 ====== Commit message 1. The logical replication slots on the primary can be synchronized to the hot standby by enabling the "failover" parameter during pg_create_logical_replication_slot() or by enabling "failover" option of the CREATE SUBSCRIPTION command and calling pg_sync_replication_slots() function on the standby. ~ SUGGESTION The logical replication slots on the primary can be synchronized to the hot standby by enabling failover during slot creation (e.g. using the "failover" parameter of pg_create_logical_replication_slot(), or using the "failover" option of the CREATE SUBSCRIPTION command), and then calling pg_sync_replication_slots() function on the standby. ====== 2. + <caution> + <para> + If after executing the function, hot_standby_feedback is disabled on + the standby or the physical slot configured in primary_slot_name is + removed, then it is possible that the necessary rows of the + synchronized slot will be removed by the VACUUM process on the primary + server, resulting in the synchronized slot becoming invalidated. + </para> + </caution> 2a. /If after/If, after/ ~ 2b. Use SGML <variable> for the GUC names (hot_standby_feedback, and primary_slot_name), and consider putting links for them as well. ====== src/sgml/logicaldecoding.sgml 3. + <sect2 id="logicaldecoding-replication-slots-synchronization"> + <title>Replication Slot Synchronization</title> + <para> + A logical replication slot on the primary can be synchronized to the hot + standby by enabling the <literal>failover</literal> option for the slot + and calling <function>pg_sync_replication_slots</function> + on the standby. The <literal>failover</literal> option of the slot + can be enabled either by enabling + <link linkend="sql-createsubscription-params-with-failover"> + <literal>failover</literal></link> + option during subscription creation or by providing <literal>failover</literal> + parameter during + <link linkend="pg-create-logical-replication-slot"> + <function>pg_create_logical_replication_slot</function></link>. IMO it will be better to slightly reword this (like was suggested for the Commit Message). I felt it is also better to refer/link to "CREATE SUBSCRIPTION" instead of saying "during subscription creation". SUGGESTION The logical replication slots on the primary can be synchronized to the hot standby by enabling failover during slot creation (e.g. using the "failover" parameter of pg_create_logical_replication_slot, or using the "failover" option of the CREATE SUBSCRIPTION command), and then calling pg_sync_replication_slots() function on the standby. ~~~ 4. + There are chances that the old primary is up again during the promotion + and if subscriptions are not disabled, the logical subscribers may keep + on receiving the data from the old primary server even after promotion + until the connection string is altered. This may result in the data + inconsistency issues and thus the logical subscribers may not be able + to continue the replication from the new primary server. + </para> 4a. /There are chances/There is a chance/ /may keep on receiving the data/may continue to receive data/ ~ 4b. BEFORE This may result in the data inconsistency issues and thus the logical subscribers may not be able to continue the replication from the new primary server. SUGGESTION This might result in data inconsistency issues, preventing the logical subscribers from being able to continue replication from the new primary server. ~ 4c. I felt this whole part "There is a chance..." should be rendered as a <note> or a <caution> or something. ====== src/backend/replication/logical/slotsync.c 5. +/* + * Return true if all necessary GUCs for slot synchronization are set + * appropriately, otherwise return false. + */ +bool +ValidateSlotSyncParams(void) +{ + char *dbname; + + /* + * A physical replication slot(primary_slot_name) is required on the + * primary to ensure that the rows needed by the standby are not removed + * after restarting, so that the synchronized slot on the standby will not + * be invalidated. + */ + if (PrimarySlotName == NULL || *PrimarySlotName == '\0') + { + ereport(ERROR, + /* translator: %s is a GUC variable name */ + errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("bad configuration for slot synchronization"), + errhint("\"%s\" must be defined.", "primary_slot_name")); + return false; + } + + /* + * hot_standby_feedback must be enabled to cooperate with the physical + * replication slot, which allows informing the primary about the xmin and + * catalog_xmin values on the standby. + */ + if (!hot_standby_feedback) + { + ereport(ERROR, + /* translator: %s is a GUC variable name */ + errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("bad configuration for slot synchronization"), + errhint("\"%s\" must be enabled.", "hot_standby_feedback")); + return false; + } + + /* + * Logical decoding requires wal_level >= logical and we currently only + * synchronize logical slots. + */ + if (wal_level < WAL_LEVEL_LOGICAL) + { + ereport(ERROR, + errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("bad configuration for slot synchronization"), + errhint("\"wal_level\" must be >= logical.")); + return false; + } + + /* + * The primary_conninfo is required to make connection to primary for + * getting slots information. + */ + if (PrimaryConnInfo == NULL || *PrimaryConnInfo == '\0') + { + ereport(ERROR, + /* translator: %s is a GUC variable name */ + errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("bad configuration for slot synchronization"), + errhint("\"%s\" must be defined.", "primary_conninfo")); + return false; + } + + /* + * The slot synchronization needs a database connection for walrcv_exec to + * work. + */ + dbname = walrcv_get_dbname_from_conninfo(PrimaryConnInfo); + if (dbname == NULL) + { + ereport(ERROR, + + /* + * translator: 'dbname' is a specific option; %s is a GUC variable + * name + */ + errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("bad configuration for slot synchronization"), + errhint("'dbname' must be specified in \"%s\".", "primary_conninfo")); + return false; + } + + return true; +} The code of this function has been flip-flopping between versions. Now, it is always giving an ERROR when something is wrong, so all of the "return false" are unreachable. It also means the function comment is wrong, and the boolean return is unused/unnecessary. ~~~ 6. SlotSyncShmemInit +/* + * Allocate and initialize slot sync shared memory. + */ This comment should use the same style wording as the other nearby shmem function comments. SUGGESTION Allocate and initialize the shared memory of slot synchronization. ~~~ 7. +/* + * Cleanup the shared memory of slot synchronization. + */ +static void +SlotSyncShmemExit(int code, Datum arg) Since this is static, should it use the snake case naming convention? -- e.g. slot_sync_shmem_exit. ~~~ 8. +/* + * Register the callback function to clean up the shared memory of slot + * synchronization. + */ +void +SlotSyncInitialize(void) +{ + before_shmem_exit(SlotSyncShmemExit, 0); +} This is only doing registration for cleanup of shmem stuff. So, does it really need it to be a separate function, or can this be registered within SlotSyncShmemInit() itself? ~~~ 9. SyncReplicationSlots + PG_TRY(); + { + validate_primary_slot_name(wrconn); + + (void) synchronize_slots(wrconn); + } + PG_FINALLY(); + { + if (syncing_slots) + { + SpinLockAcquire(&SlotSyncCtx->mutex); + SlotSyncCtx->syncing = false; + SpinLockRelease(&SlotSyncCtx->mutex); + + syncing_slots = false; + } + + walrcv_disconnect(wrconn); + } + PG_END_TRY(); IIUC, the "if (syncing_slots)" part is not really for normal operation, but it is a safe-guard for cleaning up if some unexpected ERROR happens. Maybe there should be a comment to say that. ====== src/test/recovery/t/040_standby_failover_slots_sync.pl 10. +# Confirm that the logical failover slot is created on the standby and is +# flagged as 'synced' +is($standby1->safe_psql('postgres', + q{SELECT count(*) = 2 FROM pg_replication_slots WHERE slot_name IN ('lsub1_slot', 'lsub2_slot') AND synced;}), + "t", + 'logical slots have synced as true on standby'); /slot is created/slots are created/ /and is flagged/and are flagged/ ====== Kind Regards, Peter Smith. Fujitsu Australia
pgsql-hackers by date: