Re: Introduce XID age and inactive timeout based replication slot invalidation - Mailing list pgsql-hackers
From | Peter Smith |
---|---|
Subject | Re: Introduce XID age and inactive timeout based replication slot invalidation |
Date | |
Msg-id | CAHut+Pvx294U-XBB6-BvabesUNxbnuDQmk-VOFm=pbcNWSsHvQ@mail.gmail.com Whole thread Raw |
In response to | Re: Introduce XID age and inactive timeout based replication slot invalidation (vignesh C <vignesh21@gmail.com>) |
Responses |
Re: Introduce XID age and inactive timeout based replication slot invalidation
|
List | pgsql-hackers |
Hi Nisha. Here are some review comments for patch v54-0002. (I had also checked patch v54-0001, but have no further review comments for that one). ====== doc/src/sgml/config.sgml 1. + <para> + Slot invalidation due to idle timeout occurs during checkpoint. + If the <varname>checkpoint_timeout</varname> exceeds + <varname>idle_replication_slot_timeout</varname>, the slot + invalidation will be delayed until the next checkpoint is triggered. + To avoid delays, users can force a checkpoint to promptly invalidate + inactive slots. The duration of slot inactivity is calculated using the slot's + <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>inactive_since</structfield> + value. + </para> + The wording of "If the checkpoint_timeout exceeds idle_replication_slot_timeout, the slot invalidation will be delayed until the next checkpoint is triggered." seems slightly misleading, because AFAIK it is not conditional on the GUC value differences like that -- i.e. slot invalidation is *always* delayed until the next checkpoint occurs. SUGGESTION: Slot invalidation due to idle timeout occurs during checkpoint. Because checkpoints happen at checkpoint_timeout intervals, there can be some lag between when the idle_replication_slot_timeout was exceeded and when the slot invalidation is triggered at the next checkpoint. To avoid such lags, users can force... ======= src/backend/replication/slot.c 2. GENERAL +/* Invalidate replication slots idle beyond this time; '0' disables it */ +int idle_replication_slot_timeout_ms = 0; I noticed this patch is using a variety of ways of describing the same thing: * guc var: Invalidate replication slots idle beyond this time... * guc_tables: ... the amount of time a replication slot can remain idle before it will be invalidated. * docs: means that the slot has remained idle beyond the duration specified by the idle_replication_slot_timeout parameter * errmsg: ... slot has been invalidated because inactivity exceeded the time limit set by ... * etc.. They are all the same, but they are all worded slightly differently: * "idle" vs "inactivity" vs ... * "time" vs "amount of time" vs "duration" vs "time limit" vs ... There may not be a one-size-fits-all, but still, it might be better to try to search for all different phrasing and use common wording as much as possible. ~~~ CheckPointReplicationSlots: 3. + * XXX: Slot invalidation due to 'idle_timeout' occurs only for + * released slots, based on 'idle_replication_slot_timeout'. Active + * slots in use for replication are excluded, preventing accidental + * invalidation. Slots where communication between the publisher and + * subscriber is down are also excluded, as they are managed by the + * 'wal_sender_timeout'. Maybe a slight rewording like below is better. Maybe not. YMMV. SUGGESTION: XXX: Slot invalidation due to 'idle_timeout' applies only to released slots, and is based on the 'idle_replication_slot_timeout' GUC. Active slots currently in use for replication are excluded to prevent accidental invalidation. Slots... ====== src/bin/pg_upgrade/server.c 4. + /* + * Use idle_replication_slot_timeout=0 to prevent slot invalidation due to + * inactive_timeout by checkpointer process during upgrade. + */ + if (GET_MAJOR_VERSION(cluster->major_version) >= 1800) + appendPQExpBufferStr(&pgoptions, " -c idle_replication_slot_timeout=0"); + /inactive_timeout/idle_timeout/ ====== src/test/recovery/t/043_invalidate_inactive_slots.pl 5. +# Wait for slot to first become idle and then get invalidated +sub wait_for_slot_invalidation +{ + my ($node, $slot, $offset, $idle_timeout) = @_; + my $node_name = $node->name; AFAICT this 'idle_timeout' parameter is passed units of "seconds", so it would be better to call it something like 'idle_timeout_s' to make the units clear. ~~~ 6. +# Trigger slot invalidation and confirm it in the server log +sub trigger_slot_invalidation +{ + my ($node, $slot, $offset, $idle_timeout) = @_; + my $node_name = $node->name; + my $invalidated = 0; Ditto above review comment #5 -- better to call it something like 'idle_timeout_s' to make the units clear. ====== Kind Regards, Peter Smith. Fujitsu Australia
pgsql-hackers by date: