On Thu, Nov 28, 2024 at 1:29 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Dear Nisha,
>
> >
> > Attached v51 patch-set addressing all comments in [1] and [2].
> >
>
> Thanks for working on the feature! I've stated to review the patch.
> Here are my comments - sorry if there are something which have already been discussed.
> The thread is too long to follow correctly.
>
> Comments for 0001
> =============
>
> 01. binary_upgrade_logical_slot_has_caught_up
>
> ISTM that error_if_invalid is set to true when the slot can be moved forward, otherwise
> it is set to false. Regarding the binary_upgrade_logical_slot_has_caught_up, however,
> only valid slots will be passed to the funciton (see pg_upgrade/info.c) so I feel
> it is OK to set to true. Thought?
>
Right, corrected the call with error_if_invalid as true.
> Comments for 0002
> =============
>
> 03. check_replication_slot_inactive_timeout
>
> Can we overwrite replication_slot_inactive_timeout to zero when pg_uprade (and also
> pg_createsubscriber?) starts a server process? Several parameters have already been
> specified via -c option at that time. This can avoid an error while the upgrading.
> Note that this part is still needed even if you accept the comment. Users can
> manually boot with upgrade mode.
>
Done.
> 06. found bug
>
> While testing the patch, I found that slots can be invalidated too early when when
> the GUC is quite large. I think because an overflow is caused in InvalidatePossiblyObsoleteSlot().
>
> - Reproducer
>
> I set the replication_slot_inactive_timeout to INT_MAX and executed below commands,
> and found that the slot is invalidated.
>
> ```
> postgres=# SHOW replication_slot_inactive_timeout;
> replication_slot_inactive_timeout
> -----------------------------------
> 2147483647s
> (1 row)
> postgres=# SELECT * FROM pg_create_logical_replication_slot('test', 'test_decoding');
> slot_name | lsn
> -----------+-----------
> test | 0/18B7F38
> (1 row)
> postgres=# CHECKPOINT ;
> CHECKPOINT
> postgres=# SELECT slot_name, inactive_since, invalidation_reason FROM pg_replication_slots ;
> slot_name | inactive_since | invalidation_reason
> -----------+-------------------------------+---------------------
> test | 2024-11-28 07:50:25.927594+00 | inactive_timeout
> (1 row)
> ```
>
> - analysis
>
> In InvalidatePossiblyObsoleteSlot(), replication_slot_inactive_timeout_sec * 1000
> is passed to the third argument of TimestampDifferenceExceeds(), which is also the
> integer datatype. This causes an overflow and parameter is handled as the small
> value.
>
> - solution
>
> I think there are two possible solutions. You can choose one of them:
>
> a. Make the maximum INT_MAX/1000, or
> b. Change the unit to millisecond.
>
Fixed. It is reasonable to align with other timeout parameters by
using milliseconds as the unit.
--
Thanks,
Nisha