Re: Introduce XID age and inactive timeout based replication slot invalidation - Mailing list pgsql-hackers

From Nisha Moond
Subject Re: Introduce XID age and inactive timeout based replication slot invalidation
Date
Msg-id CABdArM5ENt1fFxGjx6-NQ863ia1U9poqC7XCMcc178syr7sdYg@mail.gmail.com
Whole thread Raw
In response to RE: Introduce XID age and inactive timeout based replication slot invalidation  ("Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com>)
List pgsql-hackers
On Thu, Nov 28, 2024 at 1:29 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Dear Nisha,
>
> >
> > Attached v51 patch-set addressing all comments in [1] and [2].
> >
>
> Thanks for working on the feature! I've stated to review the patch.
> Here are my comments - sorry if there are something which have already been discussed.
> The thread is too long to follow correctly.
>
> Comments for 0001
> =============
>
> 01. binary_upgrade_logical_slot_has_caught_up
>
> ISTM that error_if_invalid is set to true when the slot can be moved forward, otherwise
> it is set to false. Regarding the binary_upgrade_logical_slot_has_caught_up, however,
> only valid slots will be passed to the funciton (see pg_upgrade/info.c) so I feel
> it is OK to set to true. Thought?
>

Right, corrected the call with error_if_invalid as true.

> Comments for 0002
> =============
>
> 03. check_replication_slot_inactive_timeout
>
> Can we overwrite replication_slot_inactive_timeout to zero when pg_uprade (and also
> pg_createsubscriber?) starts a server process? Several parameters have already been
> specified via -c option at that time. This can avoid an error while the upgrading.
> Note that this part is still needed even if you accept the comment. Users can
> manually boot with upgrade mode.
>

Done.

> 06. found bug
>
> While testing the patch, I found that slots can be invalidated too early when when
> the GUC is quite large. I think because an overflow is caused in InvalidatePossiblyObsoleteSlot().
>
> - Reproducer
>
> I set the replication_slot_inactive_timeout to INT_MAX and executed below commands,
> and found that the slot is invalidated.
>
> ```
> postgres=# SHOW replication_slot_inactive_timeout;
>  replication_slot_inactive_timeout
> -----------------------------------
>  2147483647s
> (1 row)
> postgres=# SELECT * FROM pg_create_logical_replication_slot('test', 'test_decoding');
>  slot_name |    lsn
> -----------+-----------
>  test      | 0/18B7F38
> (1 row)
> postgres=# CHECKPOINT ;
> CHECKPOINT
> postgres=# SELECT slot_name, inactive_since, invalidation_reason FROM pg_replication_slots ;
>  slot_name |        inactive_since         | invalidation_reason
> -----------+-------------------------------+---------------------
>  test      | 2024-11-28 07:50:25.927594+00 | inactive_timeout
> (1 row)
> ```
>
> - analysis
>
> In InvalidatePossiblyObsoleteSlot(), replication_slot_inactive_timeout_sec * 1000
> is passed to the third argument of TimestampDifferenceExceeds(), which is also the
> integer datatype. This causes an overflow and parameter is handled as the small
> value.
>
> - solution
>
> I think there are two possible solutions. You can choose one of them:
>
> a. Make the maximum INT_MAX/1000, or
> b. Change the unit to millisecond.
>

Fixed. It is reasonable to align with other timeout parameters by
using milliseconds as the unit.

--
Thanks,
Nisha



pgsql-hackers by date:

Previous
From: Nisha Moond
Date:
Subject: Re: Introduce XID age and inactive timeout based replication slot invalidation
Next
From: Nisha Moond
Date:
Subject: Re: Introduce XID age and inactive timeout based replication slot invalidation