Re: Synchronizing slots from primary to standby - Mailing list pgsql-hackers
From | Peter Smith |
---|---|
Subject | Re: Synchronizing slots from primary to standby |
Date | |
Msg-id | CAHut+Pv88vp9mNxX37c_Bc5FDBsTS+dhV02Vgip9Wqwh7GBYSg@mail.gmail.com Whole thread Raw |
In response to | Re: Synchronizing slots from primary to standby (shveta malik <shveta.malik@gmail.com>) |
Responses |
Re: Synchronizing slots from primary to standby
RE: Synchronizing slots from primary to standby |
List | pgsql-hackers |
Here are some review comments for patch v81-0001. ====== 1. GENERAL - ReplicationSlotInvalidationCause enum. I was thinking that the ReplicationSlotInvalidationCause should explicitly set RS_INVAL_NONE = 0 (it's zero anyway, but making it explicit with a comment /* Must be zero. */. will stop it from being changed in the future). ------ /* * Slots can be invalidated, e.g. due to max_slot_wal_keep_size. If so, the * 'invalidated' field is set to a value other than _NONE. */ typedef enum ReplicationSlotInvalidationCause { RS_INVAL_NONE = 0, /* Must be zero. */ ... } ReplicationSlotInvalidationCause; ------ The reason to do this is because many places in the patch check for RS_INVAL_NONE, but if RS_INVAL_NONE == 0 is assured, all those code fragments can be simplified and IMO also become more readable. e.g. update_local_synced_slot() BEFORE Assert(slot->data.invalidated == RS_INVAL_NONE); AFTER Assert(!slot->data.invalidated); ~ e.g. local_sync_slot_required() BEFORE locally_invalidated = (remote_slot->invalidated == RS_INVAL_NONE) && (local_slot->data.invalidated != RS_INVAL_NONE); AFTER locally_invalidated = !remote_slot->invalidated && local_slot->data.invalidated; ~ e.g. synchronize_one_slot() BEFORE if (slot->data.invalidated == RS_INVAL_NONE && remote_slot->invalidated != RS_INVAL_NONE) AFTER if (!slot->data.invalidated && remote_slot->invalidated; BEFORE /* Skip the sync of an invalidated slot */ if (slot->data.invalidated != RS_INVAL_NONE) AFTER /* Skip the sync of an invalidated slot */ if (slot->data.invalidated) BEFORE /* Skip creating the local slot if remote_slot is invalidated already */ if (remote_slot->invalidated != RS_INVAL_NONE) AFTER /* Skip creating the local slot if remote_slot is invalidated already */ if (remote_slot->invalidated) ~ e.g. synchronize_slots() BEFORE if ((XLogRecPtrIsInvalid(remote_slot->restart_lsn) || XLogRecPtrIsInvalid(remote_slot->confirmed_lsn) || !TransactionIdIsValid(remote_slot->catalog_xmin)) && remote_slot->invalidated == RS_INVAL_NONE) AFTER if ((XLogRecPtrIsInvalid(remote_slot->restart_lsn) || XLogRecPtrIsInvalid(remote_slot->confirmed_lsn) || !TransactionIdIsValid(remote_slot->catalog_xmin)) && !remote_slot->invalidated) ====== src/backend/replication/logical/slotsync.c 2. update_local_synced_slot + if (strcmp(remote_slot->plugin, NameStr(slot->data.plugin)) == 0 && + remote_dbid == slot->data.database && + !xmin_changed && !restart_lsn_changed && + remote_slot->two_phase == slot->data.two_phase && + remote_slot->failover == slot->data.failover && + remote_slot->confirmed_lsn == slot->data.confirmed_flush) + return false; Consider rearranging the conditions to put the strcmp later -- e.g. might as well avoid the (more expensive?) strcmp if some of those boolean tests are already false. ~~~ 3. + /* + * There is a possibility of parallel database drop by startup + * process and re-creation of new slot by user in the small window + * between getting the slot to drop and locking the db. This new + * user-created slot may end up using the same shared memory as + * that of 'local_slot'. Thus check if local_slot is still the + * synced one before performing actual drop. + */ BEFORE There is a possibility of parallel database drop by startup process and re-creation of new slot by user in the small window between getting the slot to drop and locking the db. SUGGESTION In the small window between getting the slot to drop and locking the database, there is a possibility of a parallel database drop by the startup process or the creation of a new slot by the user. ~~~ 4. +/* + * Synchronize single slot to given position. + * + * This creates a new slot if there is no existing one and updates the + * metadata of the slot as per the data received from the primary server. + * + * The slot is created as a temporary slot and stays in the same state until the + * the remote_slot catches up with locally reserved position and local slot is + * updated. The slot is then persisted and is considered as sync-ready for + * periodic syncs. + */ /Synchronize single slot to given position./Synchronize a single slot to the given position./ ~~~ 5. synchronize_slots + /* + * The primary_slot_name is not set yet or WALs not received yet. + * Synchronization is not possible if the walreceiver is not started. + */ + latestWalEnd = GetWalRcvLatestWalEnd(); + SpinLockAcquire(&WalRcv->mutex); + if ((WalRcv->slotname[0] == '\0') || + XLogRecPtrIsInvalid(latestWalEnd)) + { + SpinLockRelease(&WalRcv->mutex); + return; + } + SpinLockRelease(&WalRcv->mutex); The comment talks about the GUC "primary_slot_name", but the code is checking the WalRcv's slotname. It may be the same, but the difference is confusing. ~~~ 6. + /* + * If restart_lsn, confirmed_lsn or catalog_xmin is invalid but slot + * is not invalidated, that means we have fetched the remote_slot in + * its RS_EPHEMERAL state itself. In such a case, avoid syncing it + * yet. We can always sync it in the next sync cycle when the + * remote_slot is persisted and has valid lsn(s) and xmin values. + * + * XXX: In future, if we plan to expose 'slot->data.persistency' in + * pg_replication_slots view, then we can avoid fetching RS_EPHEMERAL + * slots in the first place. + */ SUGGESTION (1st para) If restart_lsn, confirmed_lsn or catalog_xmin is invalid but the slot is valid, that means we have fetched the remote_slot in its RS_EPHEMERAL state. In such a case, don't sync it; we can always sync it in the next ... ~~~ 7. + /* + * Use shared lock to prevent a conflict with + * ReplicationSlotsDropDBSlots(), trying to drop the same slot during + * a drop-database operation. + */ + LockSharedObject(DatabaseRelationId, remote_dbid, 0, AccessShareLock); + + synchronize_one_slot(remote_slot, remote_dbid); + + UnlockSharedObject(DatabaseRelationId, remote_dbid, 0, AccessShareLock); IMO remove the blank lines (e.g., you don't use this kind of formatting for spin locks) ====== Kind Regards, Peter Smith. Fujitsu Australia
pgsql-hackers by date: