Re: Improve pg_sync_replication_slots() to wait for primary to advance - Mailing list pgsql-hackers
From | Ajin Cherian |
---|---|
Subject | Re: Improve pg_sync_replication_slots() to wait for primary to advance |
Date | |
Msg-id | CAFPTHDYBZytxLaOsU1FyhgNC_BkfXPUQPihNxUB2dYp-GwTEFA@mail.gmail.com Whole thread Raw |
In response to | Re: Improve pg_sync_replication_slots() to wait for primary to advance (shveta malik <shveta.malik@gmail.com>) |
Responses |
Re: Improve pg_sync_replication_slots() to wait for primary to advance
Re: Improve pg_sync_replication_slots() to wait for primary to advance |
List | pgsql-hackers |
On Thu, Aug 14, 2025 at 4:44 PM shveta malik <shveta.malik@gmail.com> wrote: > > On Thu, Aug 14, 2025 at 7:28 AM Ajin Cherian <itsajin@gmail.com> wrote: > > > > > > Patch v6 attached. > > > > Thanks Ajin. Please find my comments: > > 1) > SyncReplicationSlots: > + remote_slots = fetch_or_refresh_remote_slots(wrconn, NIL); > + > + /* Retry until all slots are sync ready atleast */ > + for (;;) > + { > + bool some_slot_updated = false; > + > + /* > + * Refresh the remote slot data. We keep using the original slot > + * list, even if some slots are already sync ready, so that all > + * slots are updated with the latest status from the primary. > + */ > + remote_slots = fetch_or_refresh_remote_slots(wrconn, remote_slots); > > When the API begins, it seems we are fetching remote_list twice > before we even sync it once. We can get rid of > 'fetch_or_refresh_remote_slots' from outside the loop and retain the > inside one. At first call, remote_slots will be NIL and thus it will > fetch all slots and in subsequent calls, it will be populated one. > Fixed. > > 2) > SyncReplicationSlots: > + /* > + * The syscache access in fetch_or_refresh_remote_slots() needs a > + * transaction env. > + */ > + if (!IsTransactionState()) { > + StartTransactionCommand(); > + started_tx = true; > + } > > + if (started_tx) > + CommitTransactionCommand(); > > Shall we move these 2 inside fetch_or_refresh_remote_slots() (both > worker and APi flow) similar to how validate_remote_info() also has it > inside? > I tried this but it doesn't work because when the transaction is committed, the memory allocation for the remote slots are also freed. So, this needs to be on the outside. > 3) > SyncReplicationSlots: > + /* Done if all slots are atleast sync ready */ > + if (!SlotSyncCtx->slot_not_persisted) > + break; > + else > + { > + /* wait for 2 seconds before retrying */ > + wait_for_slot_activity(some_slot_updated, true); > > No need to have 'else' block here. The code can be put without having > 'else' because 'if' when true, breaks from the loop. > Fixed. > > 4) > 'fetch_or_refresh_remote_slots' can be renamed to 'fetch_remote_slots' > simply and then a comment can define an extra argument. Because > ultimately we are re-fetching some/all slots in both cases. > Done. > 5) > In the case of API, wait_for_slot_activity() does not change its wait > time based on 'some_slot_updated'. I think we can pull 'WaitLatch, > ResetLatch' in API-function itself and lets not change worker's > wait_for_slot_activity(). > Done. > 6) > fetch_or_refresh_remote_slots: > + { > + if (is_refresh) > + { > + ereport(WARNING, > + errmsg("could not fetch updated failover logical slots info" > + " from the primary server: %s", > + res->err)); > + pfree(query.data); > + return remote_slot_list; /* Return original list on refresh failure */ > + } > + else > + { > + ereport(ERROR, > + errmsg("could not fetch failover logical slots info from the primary > server: %s", > + res->err)); > + } > + } > > I think there is no need for different behaviour here for worker and > API. Since worker errors-out here, we can make API also error-out. > Fixed. > 7) > +fetch_or_refresh_remote_slots(WalReceiverConn *wrconn, List *remote_slot_list) > > We can name the argument as 'target_slot_list' and replace the name > 'updated_slot_list' with 'remote_slot_list'. > Fixed. > > 8) > + /* If refreshing, free the original list structures */ > + if (is_refresh) > + { > + foreach(lc, remote_slot_list) > + { > + RemoteSlot *old_slot = (RemoteSlot *) lfirst(lc); > + pfree(old_slot); > + } > + list_free(remote_slot_list); > + } > > We can get rid of 'is_refresh' and can simply check if > 'target_slot_list != NIL', free it. We can use list_free_deep instead > of freeing each element. Having said that, it looks slightly odd to > free the list in this function, I will think more here. Meanwhile, we > can do this. > Fixed. > > 9) > -update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid) > +update_and_persist_local_synced_slot(WalReceiverConn * wrconn, > + RemoteSlot * remote_slot, Oid remote_dbid) > > We can get rid of wrconn as we are not using it. Same with wrconn > argument for synchronize_one_slot() > Done. > 10) > + /* used by pg_sync_replication_slots() API only */ > + bool slot_not_persisted; > > We can move comment outside structure. We can first define it and then > say the above line. > Done. > > 11) > + SlotSyncCtx->slot_not_persisted = false; > > This may overwrite the 'slot_not_persisted' set for the previous slot > and ultimately make it 'false' at the end of cycle even though we had > few not-persisted slots in the beginning of cycle. Should it be: > > SlotSyncCtx->slot_not_persisted |= false; > Fixed. > 12) > Shall we rename this to : slot_persistence_pending (based on many > other modules using similar names: detach_pending, send_pending, > callback_pending)? > Done. > 13) > - errmsg("could not synchronize replication slot \"%s\"", > - remote_slot->name), > - errdetail("Synchronization could lead to data loss, because the > remote slot needs WAL at LSN %X/%08X and catalog xmin %u, but the > standby has LSN %X/%08X and catalog xmin %u.", > - LSN_FORMAT_ARGS(remote_slot->restart_lsn), > - remote_slot->catalog_xmin, > - LSN_FORMAT_ARGS(slot->data.restart_lsn), > - slot->data.catalog_xmin)); > + errmsg("Replication slot \"%s\" is not sync ready; will keep retrying", > + remote_slot->name), > + errdetail("Attempting Synchronization could lead to data loss, > because the remote slot needs WAL at LSN %X/%08X and catalog xmin %u, > but the standby has LSN %X/%08X and catalog xmin %u.", > + LSN_FORMAT_ARGS(remote_slot->restart_lsn), > + remote_slot->catalog_xmin, > + LSN_FORMAT_ARGS(slot->data.restart_lsn), > + slot->data.catalog_xmin)); > > We can retain the same message as it was put after a lot of > discussion. We can attempt to change if others comment. The idea is > since a worker dumps it in each subsequent cycle (if such a situation > arises), on the same basis now the API can also do so because it is > also performing multiple cycles now. Earlier I had suggested changing > it for API based on messages 'continuing to wait..' which are no > longer there now. > Done. On Thu, Aug 14, 2025 at 10:44 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote: > > On Thu, Aug 14, 2025 at 12:14 PM shveta malik <shveta.malik@gmail.com> wrote: > > > > > > 8) > > + /* If refreshing, free the original list structures */ > > + if (is_refresh) > > + { > > + foreach(lc, remote_slot_list) > > + { > > + RemoteSlot *old_slot = (RemoteSlot *) lfirst(lc); > > + pfree(old_slot); > > + } > > + list_free(remote_slot_list); > > + } > > > > We can get rid of 'is_refresh' and can simply check if > > 'target_slot_list != NIL', free it. We can use list_free_deep instead > > of freeing each element. Having said that, it looks slightly odd to > > free the list in this function, I will think more here. Meanwhile, we > > can do this. > > > > +1. The function prologue doesn't mention that the original list is > deep freed. So a caller may try to access it after this call, which > will lead to a crash. As a safe programming practice we should let the > caller free the original list if it is not needed anymore OR modify > the input list in-place and return it for the convenience of the > caller like all list_* interfaces. At least we should document this > behavior in the function prologue. You could also use foreach_ptr > instead of foreach. > I've changed the logic so that it is the responsibility of the caller to free the list. > > 13) > > - errmsg("could not synchronize replication slot \"%s\"", > > - remote_slot->name), > > - errdetail("Synchronization could lead to data loss, because the > > remote slot needs WAL at LSN %X/%08X and catalog xmin %u, but the > > standby has LSN %X/%08X and catalog xmin %u.", > > - LSN_FORMAT_ARGS(remote_slot->restart_lsn), > > - remote_slot->catalog_xmin, > > - LSN_FORMAT_ARGS(slot->data.restart_lsn), > > - slot->data.catalog_xmin)); > > + errmsg("Replication slot \"%s\" is not sync ready; will keep retrying", > > + remote_slot->name), > > + errdetail("Attempting Synchronization could lead to data loss, > > because the remote slot needs WAL at LSN %X/%08X and catalog xmin %u, > > but the standby has LSN %X/%08X and catalog xmin %u.", > > + LSN_FORMAT_ARGS(remote_slot->restart_lsn), > > + remote_slot->catalog_xmin, > > + LSN_FORMAT_ARGS(slot->data.restart_lsn), > > + slot->data.catalog_xmin)); > > > > We can retain the same message as it was put after a lot of > > discussion. We can attempt to change if others comment. The idea is > > since a worker dumps it in each subsequent cycle (if such a situation > > arises), on the same basis now the API can also do so because it is > > also performing multiple cycles now. Earlier I had suggested changing > > it for API based on messages 'continuing to wait..' which are no > > longer there now. > > Also we usually don't use capital letters at the start of the error > message. Any reason this is different? > Retained the old message. > Some more > > + * When called from pg_sync_replication_slots, use a fixed 2 > + * second wait time. > > Function prologue doesn't mention this. Probably the prologue should > contain only the first sentence there. Rest of the prologues just > repeat comments in the function. The function is small enough that a > reader could read the details from the function instead of the > prologue. > > + wait_time = SLOTSYNC_API_NAPTIME_MS; > + } else { > > } else and { should be on separate lines. > I've removed the changes in this function and it is now the same as before. Attaching patch v7 addressing all the above comments. regards, Ajin Cherian Fujitsu Australia
Attachment
pgsql-hackers by date: