Re: Introduce XID age and inactive timeout based replication slot invalidation - Mailing list pgsql-hackers

From Shlok Kyal
Subject Re: Introduce XID age and inactive timeout based replication slot invalidation
Date
Msg-id CANhcyEUp12ZGREGMKdLtoZqO353-bEaqe2krcuTAz10h27qv7w@mail.gmail.com
Whole thread Raw
In response to Re: Introduce XID age and inactive timeout based replication slot invalidation  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On Tue, 4 Feb 2025 at 10:45, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Feb 3, 2025 at 6:35 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
> >
> > I reviewed the v66 patch. I have few comments:
> >
> > 1. I also feel the default value should be set to '0' as suggested by
> > Vignesh in 1st point of [1].
> >
>
> +1. This will ensure that the idle slots won't be invalidated by
> default, the same as HEAD. We can change the default value based on
> user inputs.
>
> > 2. Should we allow copying of invalidated slots?
> > Currently we are able to copy slots which are invalidated:
> >
> > postgres=# select slot_name, active, restart_lsn, wal_status,
> > inactive_since , invalidation_reason from pg_replication_slots;
> >  slot_name | active | restart_lsn | wal_status |
> > inactive_since          | invalidation_reason
> > -----------+--------+-------------+------------+----------------------------------+---------------------
> >  test1     | f      | 0/16FDDE0   | lost       | 2025-02-03
> > 18:28:01.802463+05:30 | idle_timeout
> > (1 row)
> >
> > postgres=# select pg_copy_logical_replication_slot('test1', 'test2');
> >  pg_copy_logical_replication_slot
> > ----------------------------------
> >  (test2,0/16FDE18)
> > (1 row)
> >
> > postgres=# select slot_name, active, restart_lsn, wal_status,
> > inactive_since , invalidation_reason from pg_replication_slots;
> >  slot_name | active | restart_lsn | wal_status |
> > inactive_since          | invalidation_reason
> > -----------+--------+-------------+------------+----------------------------------+---------------------
> >  test1     | f      | 0/16FDDE0   | lost       | 2025-02-03
> > 18:28:01.802463+05:30 | idle_timeout
> >  test2     | f      | 0/16FDDE0   | reserved   | 2025-02-03
> > 18:29:53.478023+05:30 |
> > (2 rows)
> >
>
> Is this related to this patch or the behavior of HEAD? If this
> behavior is not introduced by this patch then we should discuss this
> in a separate thread. I couldn't think of why anyone wants to copy the
> invalid slots, so we should probably prohibit copying invalid slots
> but that is a matter of separate discussion unless introduced by this
> patch.
>

Hi Amit,

I tested and found that this issue is present in HEAD as well.

There are three types of invalidation in HEAD:
1. "wal_removed"
2. "rows_removed"
3. "wal_level_insufficient"

for copying slot with invalidation "wal_removed" we get an error:

postgres=# select slot_name, active, active_pid, restart_lsn,
wal_status, invalidation_reason from pg_replication_slots;
slot_name | active | active_pid | restart_lsn | wal_status | invalidation_reason
-----------+--------+------------+-------------+------------+---------------------
test1     | f      |            |             | lost       | wal_removed
(1 row)
postgres=#  select pg_copy_logical_replication_slot('test1', 'test2');
ERROR:  cannot copy a replication slot that doesn't reserve WAL


But for slot with invalidation "rows_removed" and
"wal_level_insufficient" we are able to copy the slot:

postgres=# select slot_name, active, active_pid, restart_lsn,
wal_status, invalidation_reason from pg_replication_slots;
slot_name | active | active_pid | restart_lsn | wal_status | invalidation_reason
-----------+--------+------------+-------------+------------+---------------------
slot1     | f      |            | 0/302E718   | lost       | rows_removed
(1 row)
postgres=# select pg_copy_logical_replication_slot('slot1', 'slot2');
pg_copy_logical_replication_slot
----------------------------------
(slot2,0/302E770)
(1 row)
postgres=# select slot_name, active, active_pid, restart_lsn,
wal_status, invalidation_reason from pg_replication_slots;
slot_name | active | active_pid | restart_lsn | wal_status | invalidation_reason
-----------+--------+------------+-------------+------------+---------------------
slot1     | f      |            | 0/302E718   | lost       | rows_removed
slot2     | f      |            | 0/302E718   | reserved   |
(2 rows)

Similarly we can copy slot with invalidation "wal_level_insufficient".
I have started a new thread to address the issue [1].

[1]: https://www.postgresql.org/message-id/CANhcyEU65aH0VYnLiu=OhNNxhnhNhwcXBeT-jvRe1OiJTo_Ayg@mail.gmail.com

Thanks and Regards,
Shlok Kyal



pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: Fix assert failure when decoding XLOG_PARAMETER_CHANGE on primary
Next
From: Ashutosh Bapat
Date:
Subject: Re: Enhance 'pg_createsubscriber' to retrieve databases automatically when no database is provided.