Re: Introduce XID age and inactive timeout based replication slot invalidation - Mailing list pgsql-hackers

From shveta malik
Subject Re: Introduce XID age and inactive timeout based replication slot invalidation
Date
Msg-id CAJpy0uD21cPp9FkSCJi+NO47Y-6wmE+UW6i_K0HJCn=2UdMDmw@mail.gmail.com
Whole thread Raw
In response to Re: Introduce XID age and inactive timeout based replication slot invalidation  (shveta malik <shveta.malik@gmail.com>)
Responses Re: Introduce XID age and inactive timeout based replication slot invalidation
List pgsql-hackers
On Wed, Sep 4, 2024 at 9:17 AM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Tue, Sep 3, 2024 at 3:01 PM shveta malik <shveta.malik@gmail.com> wrote:
> >
> >


1)
It is related to one of my previous comments (pt 3 in [1]) where I
stated that inactive_since should not keep on changing once a slot is
invalidated.
Below is one side effect if inactive_since keeps on changing:

postgres=# SELECT * FROM pg_replication_slot_advance('mysubnew1_1',
pg_current_wal_lsn());
ERROR:  can no longer get changes from replication slot "mysubnew1_1"
DETAIL:  The slot became invalid because it was inactive since
2024-09-04 10:03:56.68053+05:30, which is more than 10 seconds ago.
HINT:  You might need to increase "replication_slot_inactive_timeout.".

postgres=# select now();
               now
---------------------------------
 2024-09-04 10:04:00.26564+05:30

'DETAIL' gives wrong information, we are not past 10-seconds. This is
because inactive_since got updated even in ERROR scenario.


2)
One more issue in this message is, once I set
replication_slot_inactive_timeout to a bigger value, it becomes more
misleading. This is because invalidation was done in the past using
previous value while message starts showing new value:

ALTER SYSTEM SET replication_slot_inactive_timeout TO '36h';

--see 129600 secs in DETAIL and the current time.
postgres=# SELECT * FROM pg_replication_slot_advance('mysubnew1_1',
pg_current_wal_lsn());
ERROR:  can no longer get changes from replication slot "mysubnew1_1"
DETAIL:  The slot became invalid because it was inactive since
2024-09-04 10:06:38.980939+05:30, which is more than 129600 seconds
ago.
postgres=# select now();
               now
----------------------------------
 2024-09-04 10:07:35.201894+05:30

I feel we should change this message itself.

~~~~~

When invalidation is due to wal_removed, we get a way simpler message:

newdb1=# SELECT * FROM pg_replication_slot_advance('mysubnew1_2',
pg_current_wal_lsn());
ERROR:  replication slot "mysubnew1_2" cannot be advanced
DETAIL:  This slot has never previously reserved WAL, or it has been
invalidated.

This message does not mention 'max_slot_wal_keep_size'. We should have
a similar message for our case. Thoughts?

[1]:  https://www.postgresql.org/message-id/CAJpy0uC8Dg-0JS3NRUwVUemgz5Ar2v3_EQQFXyAigWSEQ8U47Q%40mail.gmail.com

thanks
Shveta



pgsql-hackers by date:

Previous
From: jian he
Date:
Subject: Re: Add memory/disk usage for WindowAgg nodes in EXPLAIN
Next
From: Peter Eisentraut
Date:
Subject: Re: [PoC] Federated Authn/z with OAUTHBEARER