Re: Introduce XID age and inactive timeout based replication slot invalidation - Mailing list pgsql-hackers

From Bertrand Drouvot
Subject Re: Introduce XID age and inactive timeout based replication slot invalidation
Date
Msg-id ZgJjP8X3J0c7fMk9@ip-10-97-1-34.eu-west-3.compute.internal
Whole thread Raw
In response to Re: Introduce XID age and inactive timeout based replication slot invalidation  (shveta malik <shveta.malik@gmail.com>)
Responses Re: Introduce XID age and inactive timeout based replication slot invalidation
List pgsql-hackers
Hi,

On Tue, Mar 26, 2024 at 09:30:32AM +0530, shveta malik wrote:
> On Mon, Mar 25, 2024 at 12:43 PM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > I have one concern, for synced slots on standby, how do we disallow
> > invalidation due to inactive-timeout immediately after promotion?
> >
> > For synced slots, last_inactive_time and inactive_timeout are both
> > set. Let's say I bring down primary for promotion of standby and then
> > promote standby, there are chances that it may end up invalidating
> > synced slots (considering standby is not brought down during promotion
> > and thus inactive_timeout may already be past 'last_inactive_time').
> >
> 
> On standby, if we decide to maintain valid last_inactive_time for
> synced slots, then invalidation is correctly restricted in
> InvalidateSlotForInactiveTimeout() for synced slots using the check:
> 
>         if (RecoveryInProgress() && slot->data.synced)
>                 return false;

Right.

> But immediately after promotion, we can not rely on the above check
> and thus possibility of synced slots invalidation is there. To
> maintain consistent behavior regarding the setting of
> last_inactive_time for synced slots, similar to user slots, one
> potential solution to prevent this invalidation issue is to update the
> last_inactive_time of all synced slots within the ShutDownSlotSync()
> function during FinishWalRecovery(). This approach ensures that
> promotion doesn't immediately invalidate slots, and henceforth, we
> possess a correct last_inactive_time as a basis for invalidation going
> forward. This will be equivalent to updating last_inactive_time during
> restart (but without actual restart during promotion).
> The plus point of maintaining last_inactive_time for synced slots
> could be, this can provide data to the user on when last time the sync
> was attempted on that particular slot by background slot sync worker
> or SQl function. Thoughts?

Yeah, another plus point is that if the primary is down then one could look
at the synced "active_since" on the standby to get an idea of it (depends of the
last sync though).

The issue that I can see with your proposal is: what if one synced the slots
manually (with pg_sync_replication_slots()) but does not use the sync worker?
Then I think ShutDownSlotSync() is not going to help in that case.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Bharath Rupireddy
Date:
Subject: Re: pgsql: Track last_inactive_time in pg_replication_slots.
Next
From: Amit Kapila
Date:
Subject: Re: Introduce XID age and inactive timeout based replication slot invalidation