Re: Introduce XID age and inactive timeout based replication slot invalidation - Mailing list pgsql-hackers
From | Bharath Rupireddy |
---|---|
Subject | Re: Introduce XID age and inactive timeout based replication slot invalidation |
Date | |
Msg-id | CALj2ACWqLneo2kqP5DEUwDXyHoM5zPn2vE+27oB+Cse9Rvzo_A@mail.gmail.com Whole thread Raw |
In response to | Re: Introduce XID age and inactive timeout based replication slot invalidation (shveta malik <shveta.malik@gmail.com>) |
List | pgsql-hackers |
Hi, Thanks for reviewing. On Tue, Sep 3, 2024 at 3:01 PM shveta malik <shveta.malik@gmail.com> wrote: > > 1) > I see that ReplicationSlotAlter() will error out if the slot is > invalidated due to timeout. I have not tested it myself, but do you > know if slot-alter errors out for other invalidation causes as well? > Just wanted to confirm that the behaviour is consistent for all > invalidation causes. Will respond to Amit's comment soon. > 2) > When a slot is invalidated, and we try to use that slot, it gives this msg: > > ERROR: can no longer get changes from replication slot "mysubnew1_2" > DETAIL: The slot became invalid because it was inactive since > 2024-09-03 14:23:34.094067+05:30, which is more than 600 seconds ago. > HINT: You might need to increase "replication_slot_inactive_timeout.". > > Isn't HINT misleading? Even if we increase it now, the slot can not be > reused again. > > Below is one side effect if inactive_since keeps on changing: > > postgres=# SELECT * FROM pg_replication_slot_advance('mysubnew1_1', > pg_current_wal_lsn()); > ERROR: can no longer get changes from replication slot "mysubnew1_1" > DETAIL: The slot became invalid because it was inactive since > 2024-09-04 10:03:56.68053+05:30, which is more than 10 seconds ago. > HINT: You might need to increase "replication_slot_inactive_timeout.". > > postgres=# select now(); > now > --------------------------------- > 2024-09-04 10:04:00.26564+05:30 > > 'DETAIL' gives wrong information, we are not past 10-seconds. This is > because inactive_since got updated even in ERROR scenario. > > ERROR: can no longer get changes from replication slot "mysubnew1_1" > DETAIL: The slot became invalid because it was inactive since > 2024-09-04 10:06:38.980939+05:30, which is more than 129600 seconds > ago. > postgres=# select now(); > now > ---------------------------------- > 2024-09-04 10:07:35.201894+05:30 > > I feel we should change this message itself. Removed the hint and corrected the detail message as following: errmsg("can no longer get changes from replication slot \"%s\"", NameStr(s->data.name)), errdetail("This slot has been invalidated because it was inactive for longer than the amount of time specified by \"%s\".", "replication_slot_inactive_timeout."))); > 3) > When the slot is invalidated, the' inactive_since' still keeps on > changing when there is a subscriber trying to start replication > continuously. I think ReplicationSlotAcquire() keeps on failing and > thus Release keeps on setting it again and again. Shouldn't we stop > setting/chnaging 'inactive_since' once the slot is invalidated > already, otherwise it will be misleading. > > postgres=# select failover,synced,inactive_since,invalidation_reason > from pg_replication_slots; > > failover | synced | inactive_since | invalidation_reason > ----------+--------+----------------------------------+--------------------- > t | f | 2024-09-03 14:23:.. | inactive_timeout > > after sometime: > failover | synced | inactive_since | invalidation_reason > ----------+--------+----------------------------------+--------------------- > t | f | 2024-09-03 14:26:..| inactive_timeout Changed it to not update inactive_since for slots invalidated due to inactive timeout. > 4) > src/sgml/config.sgml: > > 4a) > + A value of zero (which is default) disables the timeout mechanism. > > Better will be: > A value of zero (which is default) disables the inactive timeout > invalidation mechanism . Changed. > 4b) > 'synced' and inactive_since should point to pg_replication_slots: > > example: > <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>synced</structfield> Modified. > 5) > src/sgml/system-views.sgml: > + ..the slot has been inactive for longer than the duration specified > by replication_slot_inactive_timeout parameter. > > Better to have: > ..the slot has been inactive for a time longer than the duration > specified by the replication_slot_inactive_timeout parameter. Changed it to the following to be consistent with the config.sgml. <literal>inactive_timeout</literal> means that the slot has been inactive for longer than the amount of time specified by the <xref linkend="guc-replication-slot-inactive-timeout"/> parameter. Please find the v45 patch posted upthread at https://www.postgresql.org/message-id/CALj2ACWXQT3_HY40ceqKf1DadjLQP6b1r%3D0sZRh-xhAOd-b0pA%40mail.gmail.com for the changes. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
pgsql-hackers by date: