Re: Introduce XID age and inactive timeout based replication slot invalidation - Mailing list pgsql-hackers
From | Shlok Kyal |
---|---|
Subject | Re: Introduce XID age and inactive timeout based replication slot invalidation |
Date | |
Msg-id | CANhcyEWgb8WquGM34xHCknAQxVVp0yYBCsOtEgo5r0Y9aimFAw@mail.gmail.com Whole thread Raw |
In response to | Re: Introduce XID age and inactive timeout based replication slot invalidation (vignesh C <vignesh21@gmail.com>) |
Responses |
Re: Introduce XID age and inactive timeout based replication slot invalidation
|
List | pgsql-hackers |
On Fri, 31 Jan 2025 at 17:50, Nisha Moond <nisha.moond412@gmail.com> wrote: > > On Fri, Jan 31, 2025 at 2:32 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Fri, Jan 31, 2025 at 10:40 AM Peter Smith <smithpb2250@gmail.com> wrote: > > > > > > ====== > > > src/backend/replication/slot.c > > > > > > ReportSlotInvalidation: > > > > > > 1. > > > + > > > + case RS_INVAL_IDLE_TIMEOUT: > > > + Assert(inactive_since > 0); > > > + /* translator: second %s is a GUC variable name */ > > > + appendStringInfo(&err_detail, > > > + _("The slot has remained idle since %s, which is longer than the > > > configured \"%s\" duration."), > > > + timestamptz_to_str(inactive_since), > > > + "idle_replication_slot_timeout"); > > > + break; > > > + > > > > > > errdetail: > > > > > > I guess it is no fault of this patch because I see you've only copied > > > nearby code, but AFAICT this function is still having an each-way bet > > > by using a mixture of _() macro which is for strings intended be > > > translated, but then only using them in errdetail_internal() which is > > > for strings that are NOT intended to be translated. Isn't it > > > contradictory? Why don't we use errdetail() here? > > > > > > > Your question is valid and I don't have an answer. I encourage you to > > start a new thread to clarify this. > > > > > errhint: > > > > > > Also, the way the 'hint' is implemented can only be meaningful for > > > RS_INVAL_WAL_REMOVED. This is also existing code that IMO it was > > > always strange, but now that this patch has added another kind of > > > switch (cause) this hint implementation now looks increasingly hacky > > > to me; it is also inflexible -- e.g. if you ever wanted to add > > > different hints. A neater implementation would be to make the code > > > more like how the err_detail is handled, so then the errhint string > > > would only be assigned within the "case RS_INVAL_WAL_REMOVED:" > > > > > > > This makes sense to me. > > > > + > > + case RS_INVAL_IDLE_TIMEOUT: > > + Assert(inactive_since > 0); > > + /* translator: second %s is a GUC variable name */ > > + appendStringInfo(&err_detail, > > + _("The slot has remained idle since %s, which is longer than the > > configured \"%s\" duration."), > > + timestamptz_to_str(inactive_since), > > + "idle_replication_slot_timeout"); > > > > I think the above message should be constructed on a model similar to > > the following nearby message:"The slot's restart_lsn %X/%X exceeds the > > limit by %llu bytes.". So, how about the following: "The slot's idle > > time %s exceeds the configured \"%s\" duration"? > > > > Also, similar to max_slot_wal_keep_size, we should give a hint in this > > case to increase idle_replication_slot_timeout. > > > > It is not clear why the injection point test is doing > > pg_sync_replication_slots() etc. in the patch. The test should be > > simple such that after creating a new physical or logical slot, enable > > the injection point, then run the manual checkpoint command, and check > > the invalidation status of the slot. > > > > Thanks for the review! I have incorporated the above comments. The > test in patch-002 has been optimized as suggested and now completes in > less than a second. > Please find the attached v66 patch set. The base patch(v65-001) is > committed now, so I have rebased the patches. > > Thank you, Kuroda-san, for working on patch-002. > Hi Nisha, I reviewed the v66 patch. I have few comments: 1. I also feel the default value should be set to '0' as suggested by Vignesh in 1st point of [1]. 2. Should we allow copying of invalidated slots? Currently we are able to copy slots which are invalidated: postgres=# select slot_name, active, restart_lsn, wal_status, inactive_since , invalidation_reason from pg_replication_slots; slot_name | active | restart_lsn | wal_status | inactive_since | invalidation_reason -----------+--------+-------------+------------+----------------------------------+--------------------- test1 | f | 0/16FDDE0 | lost | 2025-02-03 18:28:01.802463+05:30 | idle_timeout (1 row) postgres=# select pg_copy_logical_replication_slot('test1', 'test2'); pg_copy_logical_replication_slot ---------------------------------- (test2,0/16FDE18) (1 row) postgres=# select slot_name, active, restart_lsn, wal_status, inactive_since , invalidation_reason from pg_replication_slots; slot_name | active | restart_lsn | wal_status | inactive_since | invalidation_reason -----------+--------+-------------+------------+----------------------------------+--------------------- test1 | f | 0/16FDDE0 | lost | 2025-02-03 18:28:01.802463+05:30 | idle_timeout test2 | f | 0/16FDDE0 | reserved | 2025-02-03 18:29:53.478023+05:30 | (2 rows) 3. We have similar behaviour as above for physical slots. [1]: https://www.postgresql.org/message-id/CALDaNm14QrW5j6su%2BEAqjwnHbiwXJwO%2Byk73_%3D7yvc5TVY-43g%40mail.gmail.com Thanks and Regards, Shlok Kyal
pgsql-hackers by date: