Re: Introduce XID age and inactive timeout based replication slot invalidation - Mailing list pgsql-hackers
From | vignesh C |
---|---|
Subject | Re: Introduce XID age and inactive timeout based replication slot invalidation |
Date | |
Msg-id | CALDaNm1J_mdqCYjQZgfQMVhJrxndPem5ruxpG_67t4C_2My9WQ@mail.gmail.com Whole thread Raw |
In response to | Re: Introduce XID age and inactive timeout based replication slot invalidation (vignesh C <vignesh21@gmail.com>) |
List | pgsql-hackers |
On Tue, 28 Jan 2025 at 17:28, Nisha Moond <nisha.moond412@gmail.com> wrote: > > On Tue, Jan 28, 2025 at 3:26 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Mon, Dec 30, 2024 at 11:05 AM Peter Smith <smithpb2250@gmail.com> wrote: > > > > > > I think we are often too quick to throw out perfectly good tests. > > > Citing that some similar GUCs don't do testing as a reason to skip > > > them just seems to me like an example of "two wrongs don't make a > > > right". > > > > > > There is a third option. > > > > > > Keep the tests. Because they take excessive time to run, that simply > > > means you should run them *conditionally* based on the PG_TEST_EXTRA > > > environment variable so they don't impact the normal BF execution. The > > > documentation [1] says this env var is for "resource intensive" tests > > > -- AFAIK this is exactly the scenario we find ourselves in, so is > > > exactly what this env var was meant for. > > > > > > Search other *.pl tests for PG_TEST_EXTRA to see some examples. > > > > > > > I don't see the long-running tests to be added under PG_TEST_EXTRA as > > that will make it unusable after some point. Now, if multiple senior > > members feel it is okay to add long-running tests under PG_TEST_EXTRA > > then I am open to considering it. We can keep this test as a separate > > patch so that the patch is being tested in CI or in manual tests > > before commit. > > > > Please find the attached v64 patches. The changes in this version > w.r.t. older patch v63 are as - > - The changes from the v63-0001 patch have been moved to a separate thread [1]. > - The v63-0002 patch has been split into two parts in v64: > 1) 001 patch: Implements the main feature - inactive timeout-based > slot invalidation. > 2) 002 patch: Separates the TAP test "044_invalidate_inactive_slots" > as suggested above. > > [1] https://www.postgresql.org/message-id/CABdArM6pBL5hPnSQ%2B5nEVMANcF4FCH7LQmgskXyiLY75TMnKpw%40mail.gmail.com Few comments: 1) We can mention about the slot that do not reserve WAL is also not applicable: + <para> + Note that the idle timeout invalidation mechanism is not + applicable for slots on the standby server that are being synced + from the primary server (i.e., standby slots having + <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>synced</structfield> + value <literal>true</literal>). + Synced slots are always considered to be inactive because they don't + perform logical decoding to produce changes. + </para> 2) Similarly we can mention in the commit message also that it will not be considered for slot that do not reserve WAL: Note that the idle timeout invalidation mechanism is not applicable for slots on the standby server that are being synced from the primary server (i.e., standby slots having 'synced' field 'true'). Synced slots are always considered to be inactive because they don't perform logical decoding to produce changes. 3) Since idle_replication_slot_timeout is somewhat similar to max_slot_wal_keep_size, we can move idle_replication_slot_timeout after max_slot_wal_keep_size instead of keeping it after wal_sender_timeout. + <varlistentry id="guc-idle-replication-slot-timeout" xreflabel="idle_replication_slot_timeout"> + <term><varname>idle_replication_slot_timeout</varname> (<type>integer</type>) + <indexterm> + <primary><varname>idle_replication_slot_timeout</varname> configuration parameter</primary> + </indexterm> + </term> + <listitem> + <para> + Invalidate replication slots that have remained idle longer than this + duration. If this value is specified without units, it is taken as + minutes. A value of zero disables the idle timeout invalidation mechanism. + The default is one day. This parameter can only be set in the + <filename>postgresql.conf</filename> file or on the server command line. + </para> 4) We can try to keep it to less than 80 char wherever possible: a) Like in this case, "mechanism" can be moved to the next line: + duration. If this value is specified without units, it is taken as + minutes. A value of zero disables the idle timeout invalidation mechanism. b) Similarly here too, "slot's" can be moved to the next line: + inactive slots. The duration of slot inactivity is calculated using the slot's + <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>inactive_since</structfield> 5) You can use new ereport style to exclude brackets around errcode: + ereport(ERROR, + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), + errmsg("can no longer get changes from replication slot \"%s\"", + NameStr(s->data.name)), + errdetail("This slot has been invalidated because it has remained idle longer than the configured \"%s\" duration.", + "idle_replication_slot_timeout"))); Regards, Vignesh
pgsql-hackers by date: