Re: Synchronizing slots from primary to standby - Mailing list pgsql-hackers
From | shveta malik |
---|---|
Subject | Re: Synchronizing slots from primary to standby |
Date | |
Msg-id | CAJpy0uCRykgTBwu9tTTdxxcM-r9FAerAO4yJgWZc7MFb-06gSw@mail.gmail.com Whole thread Raw |
In response to | Re: Synchronizing slots from primary to standby (Peter Smith <smithpb2250@gmail.com>) |
List | pgsql-hackers |
On Tue, Oct 10, 2023 at 12:52 PM Peter Smith <smithpb2250@gmail.com> wrote: > > On Mon, Oct 9, 2023 at 9:34 PM shveta malik <shveta.malik@gmail.com> wrote: > > > > On Wed, Oct 4, 2023 at 8:53 AM Peter Smith <smithpb2250@gmail.com> wrote: > > > > > > Here are some review comments for v20-0002. > > > > > > > Thanks Peter for the feedback. Comments from 31 till end are addressed > > in v22. First 30 comments will be addressed in the next version. > > > > Thanks for addressing my previous comments. > > I checked those and went through other changes in v22-0002 to give a > few more review comments below. > Thank You for your feedback. I have addressed these in v23. > I understand there are some design changes coming soon regarding the > use of GUCs so maybe a few of these comments will become redundant. > > ====== > doc/src/sgml/config.sgml > > 1. > A password needs to be provided too, if the sender demands password > authentication. It can be provided in the > <varname>primary_conninfo</varname> string, or in a separate > - <filename>~/.pgpass</filename> file on the standby server (use > - <literal>replication</literal> as the database name). > - Do not specify a database name in the > - <varname>primary_conninfo</varname> string. > + <filename>~/.pgpass</filename> file on the standby server. > + </para> > + <para> > + Specify a database name in <varname>primary_conninfo</varname> string > + to allow synchronization of slots from the primary to standby. This > + dbname will only be used for slots synchronization purpose and will > + be irrelevant for streaming. > </para> > > 1a. > "Specify a database name in...". Shouldn't that say "Specify dbname in..."? > > ~ > > 1b. > BEFORE > This dbname will only be used for slots synchronization purpose and > will be irrelevant for streaming. > > SUGGESTION > This will only be used for slot synchronization. It is ignored for streaming. > > ====== > doc/src/sgml/system-views.sgml > > 2. pg_replication_slots > > + <row> > + <entry role="catalog_table_entry"><para role="column_definition"> > + <structfield>synced_slot</structfield> <type>bool</type> > + </para> > + <para> > + True if this logical slot is created on physical standby as part of > + slot-synchronization from primary server. Always false for > physical slots. > + </para></entry> > + </row> > > /on physical standby/on the physical standby/ > > /from primary server/from the primary server/ > > ====== > src/backend/replication/logical/launcher.c > > 3. LaunchSlotSyncWorkers > > + /* > + * If we failed to launch this slotsync worker, return and try > + * launching rest of the workers in next sync cycle. But change > + * launcher's wait time to minimum of wal_retrieve_retry_interval and > + * default wait time to try next sync-cycle sooner. > + */ > > 3a. > Use consistent terms -- choose "sync cycle" or "sync-cycle" > > ~ > > 3b. > Is it correct to just say "rest of the workers"; won't it also try to > relaunch this same failed worker again? > > ~~~ > > 4. LauncherMain > > + /* > + * Stop the slot-sync workers if any of the related GUCs changed. > + * These will be relaunched using the new values during next > + * sync-cycle. Also revalidate the new configurations and > + * reconnect. > + */ > + if (SlotSyncConfigsChanged()) > + { > + slotsync_workers_stop(); > + > + if (wrconn) > + walrcv_disconnect(wrconn); > + > + if (RecoveryInProgress()) > + wrconn = slotsync_remote_connect(); > + } > > Was it overkill to disconnect/reconnect every time any of those GUCs > changed? Or is it enough to do that only if the > PrimaryConnInfoPreReload was changed? > > ====== > src/backend/replication/logical/logical.c > > 5. CreateDecodingContext > > + /* > + * Do not allow consumption of a "synchronized" slot until the standby > + * gets promoted. > + */ > + if (RecoveryInProgress() && slot->data.synced) > + ereport(ERROR, > + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), > + errmsg("cannot use replication slot \"%s\" for logical decoding", > + NameStr(slot->data.name)), > + errdetail("This slot is being synced from primary."), > + errhint("Specify another replication slot."))); > + > > /from primary/from the primary/ > > ====== > src/backend/replication/logical/slotsync.c > > 6. use_slot_in_query > > + /* > + * Return TRUE if either slot is not yet created on standby or if it > + * belongs to one of the dbs passed in dbids. > + */ > + if (!slot_found || relevant_db) > + return true; > + > + return false; > > Same as single line: > > return (!slot_found || relevant_db); > > ~~~ > > 7. synchronize_one_slot > > + /* > + * If the local restart_lsn and/or local catalog_xmin is ahead of > + * those on the remote then we cannot create the local slot in sync > + * with primary because that would mean moving local slot backwards > + * and we might not have WALs retained for old lsns. In this case we > + * will wait for primary's restart_lsn and catalog_xmin to catch up > + * with the local one before attempting the sync. > + */ > > /moving local slot/moving the local slot/ > > /with primary/with the primary/ > > /wait for primary's/wait for the primary's/ > > ~~~ > > 8. ProcessSlotSyncInterrupts > > + if (ConfigReloadPending) > + { > + ConfigReloadPending = false; > + > + /* Save the PrimaryConnInfo before reloading */ > + *conninfo_prev = pstrdup(PrimaryConnInfo); > > If the configuration keeps changing then there might be a slow leak > here because I didn't notice anywhere where this strdup'ed string is > getting freed. Is that something worth worrying about? > > ====== > src/backend/replication/slot.c > > 9. ReplicationSlotDrop > > + /* > + * Do not allow users to drop the slots which are currently being synced > + * from the primary to standby. > + */ > + if (user_cmd && RecoveryInProgress() && MyReplicationSlot->data.synced) > + { > + ReplicationSlotRelease(); > + ereport(ERROR, > + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), > + errmsg("cannot drop replication slot"), > + errdetail("This slot is being synced from primary."))); > + } > + > > 9a. > /to standby/to the standby/ > > ~ > > 9b. > Shouldn't the errmsg name the slot? Otherwise, the message might not > be so useful. > > ~ > > 9c. > /synced from primary/synced from the primary/ > > ====== > src/backend/replication/walsender.c > > > 10. ListSlotDatabaseOIDs > > + LWLockAcquire(ReplicationSlotControlLock, LW_SHARED); > + for (slotno = 0; slotno < max_replication_slots; slotno++) > + { > + ReplicationSlot *slot = &ReplicationSlotCtl->replication_slots[slotno]; > > This is all new code so you can use C99 for loop variable declaration here. > > ~~~ > > 11. > + /* If synchronize_slot_names is '*', then skip physical slots */ > + if (SlotIsPhysical(slot)) > + continue; > + > > > Some mental gymnastics are needed to understand how this code means " > synchronize_slot_names is '*'". > > IMO it would be easier to understand if the previous "if > (numslot_names)" was rewritten as if/else. > > ====== > .../utils/activity/wait_event_names.txt > > 12. > RECOVERY_WAL_STREAM "Waiting in main loop of startup process for WAL > to arrive, during streaming recovery." > +REPL_SLOTSYNC_MAIN "Waiting in main loop of worker for synchronizing > slots to a standby from primary." > +REPL_SLOTSYNC_PRIMARY_CATCHP "Waiting for primary to catch-up in > worker for synchronizing slots to a standby from primary." > SYSLOGGER_MAIN "Waiting in main loop of syslogger process." > > 12a. > Maybe those descriptions can be simplified a bit? > > SUGGESTION > REPL_SLOTSYNC_MAIN "Waiting in the main loop of slot-sync worker." > REPL_SLOTSYNC_PRIMARY_CATCHP "Waiting for the primary to catch up, in > slot-sync worker." > > ~ > > 12b. > typo? > > /REPL_SLOTSYNC_PRIMARY_CATCHP/REPL_SLOTSYNC_PRIMARY_CATCHUP/ > > ====== > src/include/replication/walreceiver.h > > 13. WalRcvRepSlotDbData > > +/* > + * Slot's DBid related data > + */ > +typedef struct WalRcvRepSlotDbData > +{ > + Oid database; /* Slot's DBid received from remote */ > +} WalRcvRepSlotDbData; > > Just calling this new field 'database' seems odd. Searching PG src I > found typical fields/variables like this one are called 'databaseid', > or 'dboid', or 'dbid', or 'db_id' etc. > > ====== > Kind Regards, > Peter Smith. > Fujitsu Australia
pgsql-hackers by date: