On Sat, Aug 31, 2024 at 1:45 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> Hi,
>
>
> Please find the attached v44 patch with the above changes. I will
> include the 0002 xid_age based invalidation patch later.
>
Thanks for the patch Bharath. My review and testing is WIP, but please
find few comments and queries:
1)
I see that ReplicationSlotAlter() will error out if the slot is
invalidated due to timeout. I have not tested it myself, but do you
know if slot-alter errors out for other invalidation causes as well?
Just wanted to confirm that the behaviour is consistent for all
invalidation causes.
2)
When a slot is invalidated, and we try to use that slot, it gives this msg:
ERROR: can no longer get changes from replication slot "mysubnew1_2"
DETAIL: The slot became invalid because it was inactive since
2024-09-03 14:23:34.094067+05:30, which is more than 600 seconds ago.
HINT: You might need to increase "replication_slot_inactive_timeout.".
Isn't HINT misleading? Even if we increase it now, the slot can not be
reused again.
3)
When the slot is invalidated, the' inactive_since' still keeps on
changing when there is a subscriber trying to start replication
continuously. I think ReplicationSlotAcquire() keeps on failing and
thus Release keeps on setting it again and again. Shouldn't we stop
setting/chnaging 'inactive_since' once the slot is invalidated
already, otherwise it will be misleading.
postgres=# select failover,synced,inactive_since,invalidation_reason
from pg_replication_slots;
failover | synced | inactive_since | invalidation_reason
----------+--------+----------------------------------+---------------------
t | f | 2024-09-03 14:23:.. | inactive_timeout
after sometime:
failover | synced | inactive_since | invalidation_reason
----------+--------+----------------------------------+---------------------
t | f | 2024-09-03 14:26:..| inactive_timeout
4)
src/sgml/config.sgml:
4a)
+ A value of zero (which is default) disables the timeout mechanism.
Better will be:
A value of zero (which is default) disables the inactive timeout
invalidation mechanism .
or
A value of zero (which is default) disables the slot invalidation due
to the inactive timeout mechanism.
i.e. rephrase to indicate that invalidation is disabled.
4b)
'synced' and inactive_since should point to pg_replication_slots:
example:
<link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>synced</structfield>
5)
src/sgml/system-views.sgml:
+ ..the slot has been inactive for longer than the duration specified
by replication_slot_inactive_timeout parameter.
Better to have:
..the slot has been inactive for a time longer than the duration
specified by the replication_slot_inactive_timeout parameter.
thanks
Shveta