RE: Introduce XID age and inactive timeout based replication slot invalidation - Mailing list pgsql-hackers
From | Hayato Kuroda (Fujitsu) |
---|---|
Subject | RE: Introduce XID age and inactive timeout based replication slot invalidation |
Date | |
Msg-id | TYAPR01MB56927564EEE26E5433198405F5292@TYAPR01MB5692.jpnprd01.prod.outlook.com Whole thread Raw |
In response to | Re: Introduce XID age and inactive timeout based replication slot invalidation (Peter Smith <smithpb2250@gmail.com>) |
Responses |
Re: Introduce XID age and inactive timeout based replication slot invalidation
|
List | pgsql-hackers |
Dear Nisha, > > Attached v51 patch-set addressing all comments in [1] and [2]. > Thanks for working on the feature! I've stated to review the patch. Here are my comments - sorry if there are something which have already been discussed. The thread is too long to follow correctly. Comments for 0001 ============= 01. binary_upgrade_logical_slot_has_caught_up ISTM that error_if_invalid is set to true when the slot can be moved forward, otherwise it is set to false. Regarding the binary_upgrade_logical_slot_has_caught_up, however, only valid slots will be passed to the funciton (see pg_upgrade/info.c) so I feel it is OK to set to true. Thought? 02. ReplicationSlotAcquire According to other functions, we are adding to a note to the translator when parameters represent some common nouns, GUC names. I feel we should add a comment for RS_INVAL_WAL_LEVEL part based on it. Comments for 0002 ============= 03. check_replication_slot_inactive_timeout Can we overwrite replication_slot_inactive_timeout to zero when pg_uprade (and also pg_createsubscriber?) starts a server process? Several parameters have already been specified via -c option at that time. This can avoid an error while the upgrading. Note that this part is still needed even if you accept the comment. Users can manually boot with upgrade mode. 04. ReplicationSlotAcquire Same comment as 02. 05. ReportSlotInvalidation Same comment as 02. 06. found bug While testing the patch, I found that slots can be invalidated too early when when the GUC is quite large. I think because an overflow is caused in InvalidatePossiblyObsoleteSlot(). - Reproducer I set the replication_slot_inactive_timeout to INT_MAX and executed below commands, and found that the slot is invalidated. ``` postgres=# SHOW replication_slot_inactive_timeout; replication_slot_inactive_timeout ----------------------------------- 2147483647s (1 row) postgres=# SELECT * FROM pg_create_logical_replication_slot('test', 'test_decoding'); slot_name | lsn -----------+----------- test | 0/18B7F38 (1 row) postgres=# CHECKPOINT ; CHECKPOINT postgres=# SELECT slot_name, inactive_since, invalidation_reason FROM pg_replication_slots ; slot_name | inactive_since | invalidation_reason -----------+-------------------------------+--------------------- test | 2024-11-28 07:50:25.927594+00 | inactive_timeout (1 row) ``` - analysis In InvalidatePossiblyObsoleteSlot(), replication_slot_inactive_timeout_sec * 1000 is passed to the third argument of TimestampDifferenceExceeds(), which is also the integer datatype. This causes an overflow and parameter is handled as the small value. - solution I think there are two possible solutions. You can choose one of them: a. Make the maximum INT_MAX/1000, or b. Change the unit to millisecond. Best regards, Hayato Kuroda FUJITSU LIMITED
pgsql-hackers by date: