Re: Synchronizing slots from primary to standby - Mailing list pgsql-hackers
From | shveta malik |
---|---|
Subject | Re: Synchronizing slots from primary to standby |
Date | |
Msg-id | CAJpy0uArBc_aTkBn28tKfHbhrzxQeHCCyzZbt_VZumKEWKbHZQ@mail.gmail.com Whole thread Raw |
In response to | Re: Synchronizing slots from primary to standby (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
RE: Synchronizing slots from primary to standby
|
List | pgsql-hackers |
On Thu, Feb 8, 2024 at 4:03 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > Few comments on 0001 > =================== Thanks Amit. Addressed these in v81. > 1. > + * the slots on the standby and synchronize them. This is done on every call > + * to SQL function pg_sync_replication_slots. > > > > I think the second sentence can be slightly changed to: "This is done > by a call to SQL function pg_sync_replication_slots." or "One can call > SQL function pg_sync_replication_slots to invoke this functionality." Done. > 2. > +update_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid) > { > ... > + SpinLockAcquire(&slot->mutex); > + slot->data.plugin = plugin_name; > + slot->data.database = remote_dbid; > + slot->data.two_phase = remote_slot->two_phase; > + slot->data.failover = remote_slot->failover; > + slot->data.restart_lsn = remote_slot->restart_lsn; > + slot->data.confirmed_flush = remote_slot->confirmed_lsn; > + slot->data.catalog_xmin = remote_slot->catalog_xmin; > + slot->effective_catalog_xmin = remote_slot->catalog_xmin; > + SpinLockRelease(&slot->mutex); > + > + if (remote_slot->catalog_xmin != slot->data.catalog_xmin) > + ReplicationSlotsComputeRequiredXmin(false); > + > + if (remote_slot->restart_lsn != slot->data.restart_lsn) > + ReplicationSlotsComputeRequiredLSN(); > ... > } > > How is it possible that after assigning the values from remote_slot > they can differ from local slot values? It was a mistake while comment fixing in previous versions. Corrected it now. Thanks for catching. > 3. > + /* > + * Find the oldest existing WAL segment file. > + * > + * Normally, we can determine it by using the last removed segment > + * number. However, if no WAL segment files have been removed by a > + * checkpoint since startup, we need to search for the oldest segment > + * file currently existing in XLOGDIR. > + */ > + oldest_segno = XLogGetLastRemovedSegno() + 1; > + > + if (oldest_segno == 1) > + oldest_segno = XLogGetOldestSegno(0); > > I feel this way isn't there a risk that XLogGetOldestSegno() will get > us the seg number from some previous timeline which won't make sense > to compare segno in reserve_wal_for_local_slot. Shouldn't you need to > fetch the current timeline and send as a parameter to this function as > that is the timeline on which standby is communicating with primary. Yes, modified it. > 4. > + if (remote_slot->confirmed_lsn > latestFlushPtr) > + ereport(ERROR, > + errmsg("skipping slot synchronization as the received slot sync" > > I think the internal errors should be reported with elog as you have > done at other palces in the patch. Done. > 5. > +synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid) > { > ... > + /* > + * Copy the invalidation cause from remote only if local slot is not > + * invalidated locally, we don't want to overwrite existing one. > + */ > + if (slot->data.invalidated == RS_INVAL_NONE) > + { > + SpinLockAcquire(&slot->mutex); > + slot->data.invalidated = remote_slot->invalidated; > + SpinLockRelease(&slot->mutex); > + > + /* Make sure the invalidated state persists across server restart */ > + ReplicationSlotMarkDirty(); > + ReplicationSlotSave(); > + slot_updated = true; > + } > ... > } > > Do we need to copy the 'invalidated' from remote to local if both are > same? I think this will happen for each slot each time because > normally slots won't be invalidated ones, so there is needless writes. It is not needed everytime. Optimized it. Now we copy only if local_slot's 'invalidated' value is RS_INVAL_NONE while remote-slot's value != RS_INVAL_NONE. > 6. > + * Returns TRUE if any of the slots gets updated in this sync-cycle. > + */ > +static bool > +synchronize_slots(WalReceiverConn *wrconn) > ... > ... > > +void > +SyncReplicationSlots(WalReceiverConn *wrconn) > +{ > + PG_TRY(); > + { > + validate_primary_slot_name(wrconn); > + > + (void) synchronize_slots(wrconn); > > For the purpose of 0001, synchronize_slots() doesn't seems to use > return value. So, I suggest to change it accordingly and move the > return value in the required patch. Modified it. Also changed return values of all related internal functions which were returning slot_updated. > 7. > + /* > + * The primary_slot_name is not set yet or WALs not received yet. > + * Synchronization is not possible if the walreceiver is not started. > + */ > + latestWalEnd = GetWalRcvLatestWalEnd(); > + SpinLockAcquire(&WalRcv->mutex); > + if ((WalRcv->slotname[0] == '\0') || > + XLogRecPtrIsInvalid(latestWalEnd)) > + { > + SpinLockRelease(&WalRcv->mutex); > + return false; > > For the purpose of 0001, we should give WARNING here. I will fix it in the next version. Sorry, I somehow missed it this time. thanks Shveta
pgsql-hackers by date: