Re: Clear logical slot's 'synced' flag on promotion of standby - Mailing list pgsql-hackers
From | shveta malik |
---|---|
Subject | Re: Clear logical slot's 'synced' flag on promotion of standby |
Date | |
Msg-id | CAJpy0uAv+9=2wFJ6aBQDpPj0SAywwVf=FHRFNjadO1KtjBectQ@mail.gmail.com Whole thread Raw |
In response to | Re: Clear logical slot's 'synced' flag on promotion of standby (Ashutosh Sharma <ashu.coek88@gmail.com>) |
Responses |
Re: Clear logical slot's 'synced' flag on promotion of standby
|
List | pgsql-hackers |
On Fri, Sep 19, 2025 at 7:29 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote: > > On Fri, Sep 19, 2025 at 3:04 PM shveta malik <shveta.malik@gmail.com> wrote: > > > > On Thu, Sep 18, 2025 at 5:20 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote: > > > > > > Hi Ajin, > > > > > > On Thu, Sep 18, 2025 at 4:16 PM Ajin Cherian <itsajin@gmail.com> wrote: > > > > > > > > On Fri, Sep 12, 2025 at 1:56 PM shveta malik <shveta.malik@gmail.com> wrote: > > > > > > > > > > The approach seems valid and should work, but introducing a new file > > > > > like promote.inprogress for this purpose might be excessive. We can > > > > > first try analyzing existing information to determine whether we can > > > > > distinguish between the two scenarios -- a primary in crash recovery > > > > > immediately after a promotion attempt versus a regular primary. If we > > > > > are unable to find any way, we can revisit the idea. > > > > > > > > > > > > > I needed a way to reset slots not only during promotion, but also > > > > after a crash that occurs while slots are being reset, so there would > > > > be a fallback mechanism to clear them again on startup. As Shveta > > > > pointed out, it wasn’t trivial to tell apart a standby restarting > > > > after crashing during promotion from a primary restarting after a > > > > crash. So I decided to just reset slots every time primary (or a > > > > standby after promotion) restarts. > > > > > > > > Because this fallback logic will run on every primary restart, it was > > > > important to minimize overhead added by the patch. After some > > > > discussion, I placed the reset logic in RestoreSlotFromDisk(), which > > > > is invoked by StartupReplicationSlots() whenever the server starts. > > > > Because RestoreSlotFromDisk() already loops through all slots, this > > > > adds minimum extra work; but also ensures the synced flag is cleared > > > > when running on a primary. > > > > > > > > The next challenge was finding a reliable flag to distinguish > > > > primaries from standbys, since we really don’t want to reset the flag > > > > on a standby. I tested StandbyMode, RecoveryInProgress(), and > > > > InRecovery. But during restarts, both RecoveryInProgress() and > > > > InRecovery are always true on both primary and standby. In all my > > > > testing, StandbyMode was the only variable that consistently > > > > differentiated between the two, which is what I used. > > > > > > > > > > +/* > > > + * ResetSyncedSlots() > > > + * > > > + * Reset all replication slots that have synced=true to synced=false. > > > + */ > > > > > > I feel this is not correct, we are force resetting sync flag status > > > for all logical slots, not just the one that is set to true. > > > > > > -- > > > > > > @@ -2664,6 +2715,10 @@ RestoreSlotFromDisk(const char *name) > > > memcpy(&slot->data, &cp.slotdata, > > > sizeof(ReplicationSlotPersistentData)); > > > > > > + /* reset synced flag if this is a primary server */ > > > + if (!StandbyMode) > > > + slot->data.synced = false; > > > + > > > > > > I think you also need to ensure that you are only doing this for the > > > logical slots, it's currently just checking if the slot is in-use or > > > not. > > > > > > > I think a better approach would be to reset synced only if it is > > marked as synced. Adding a LogicalSlot check wouldn't be incorrect, > > but IMO, it may not be necessary. > > > > Thinking further on this, I believe it’s fine even if the slot is > forcefully set to false on the primary. The slot type or its sync > status doesn’t really matter in this case. > I think we need to mark the slots dirty once we set its synced-flag to false. In such a case, we should only mark those which actually needed synced flag resetting. IMO, we need a 'synced' flag check here. thanks Shveta
pgsql-hackers by date: