Re: Introduce XID age and inactive timeout based replication slot invalidation - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Introduce XID age and inactive timeout based replication slot invalidation
Date
Msg-id CAA4eK1Je_C2Rw0L4tQ5R2vqoDhP7L76RXMsx7jwA1CmoG-LHFQ@mail.gmail.com
Whole thread Raw
In response to Re: Introduce XID age and inactive timeout based replication slot invalidation  (Bertrand Drouvot <bertranddrouvot.pg@gmail.com>)
Responses Re: Introduce XID age and inactive timeout based replication slot invalidation
List pgsql-hackers
On Fri, Mar 22, 2024 at 2:27 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> On Fri, Mar 22, 2024 at 01:45:01PM +0530, Bharath Rupireddy wrote:
>
> > 0001 Track invalidation_reason in pg_replication_slots
> > 0002 Track last_inactive_at in pg_replication_slots
> > 0003 Allow setting inactive_timeout for replication slots via SQL API
> > 0004 Introduce new SQL funtion pg_alter_replication_slot
> > 0005 Allow setting inactive_timeout in the replication command
> > 0006 Add inactive_timeout based replication slot invalidation
> >
> > 1. Keep it last_inactive_at as a shared memory variable, but always
> > set it at restart if the slot's inactive_timeout has non-zero value
> > and reset it as soon as someone acquires that slot so that if the slot
> > doesn't get acquired  till inactive_timeout, checkpointer will
> > invalidate the slot.
> > 4. last_inactive_at should also be set to the current time during slot
> > creation because if one creates a slot and does nothing with it then
> > it's the time it starts to be inactive.
>
> I did not look at the code yet but just tested the behavior. It works as you
> describe it but I think this behavior is weird because:
>
> - when we create a slot without a timeout then last_inactive_at is set. I think
> that's fine, but then:
> - when we restart the engine, then last_inactive_at is gone (as timeout is not
> set).
>
> I think last_inactive_at should be set also at engine restart even if there is
> no timeout.

I think it is the opposite. Why do we need to set  'last_inactive_at'
when inactive_timeout is not set? BTW, haven't we discussed that we
don't need to set 'last_inactive_at' at the time of slot creation as
it is sufficient to set it at the time ReplicationSlotRelease()?

A few other comments:
==================
1.
@@ -1027,7 +1027,8 @@ CREATE VIEW pg_replication_slots AS
             L.invalidation_reason,
             L.failover,
             L.synced,
-            L.last_inactive_at
+            L.last_inactive_at,
+            L.inactive_timeout

I think it would be better to keep 'inactive_timeout' ahead of
'last_inactive_at' as that is the primary field. In major versions, we
don't have to strictly keep the new fields at the end. In this case,
it seems better to keep these two new fields after two_phase so that
these are before invalidation_reason where we can show the
invalidation due to these fields.

2.
 void
-ReplicationSlotRelease(void)
+ReplicationSlotRelease(bool set_last_inactive_at)

Why do we need a parameter here? Can't we directly check from the slot
whether 'inactive_timeout' has a non-zero value?

3.
+ /*
+ * There's no point in allowing failover slots to get invalidated
+ * based on slot's inactive_timeout parameter on standby. The failover
+ * slots simply get synced from the primary on the standby.
+ */
+ if (!(RecoveryInProgress() && slot->data.failover))

I think you need to check 'sync' flag instead of 'failover'.
Generally, failover marker slots should be invalidated either on
primary or standby unless on standby the 'failover' marked slot is
synced from the primary.

4. I feel the patches should be arranged like 0003->0001, 0002->0002,
0006->0003. We can leave remaining for the time being till we get
these three patches (all three need to be committed as one but it is
okay to keep them separate for review) committed.

--
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Amit Langote
Date:
Subject: Re: Memory consumed by child SpecialJoinInfo in partitionwise join planning
Next
From: Bertrand Drouvot
Date:
Subject: Re: Introduce XID age and inactive timeout based replication slot invalidation