Re: Introduce XID age and inactive timeout based replication slot invalidation - Mailing list pgsql-hackers

From Bharath Rupireddy
Subject Re: Introduce XID age and inactive timeout based replication slot invalidation
Date
Msg-id CALj2ACUDD-QDxBGLRSSXhtzqGiMP2EaBPJyhGzc5CqpAm535xA@mail.gmail.com
Whole thread Raw
In response to Introduce XID age and inactive timeout based replication slot invalidation  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Responses Re: Introduce XID age and inactive timeout based replication slot invalidation
List pgsql-hackers
On Thu, Jan 11, 2024 at 10:48 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> Hi,
>
> Replication slots in postgres will prevent removal of required
> resources when there is no connection using them (inactive). This
> consumes storage because neither required WAL nor required rows from
> the user tables/system catalogs can be removed by VACUUM as long as
> they are required by a replication slot. In extreme cases this could
> cause the transaction ID wraparound.
>
> Currently postgres has the ability to invalidate inactive replication
> slots based on the amount of WAL (set via max_slot_wal_keep_size GUC)
> that will be needed for the slots in case they become active. However,
> the wraparound issue isn't effectively covered by
> max_slot_wal_keep_size - one can't tell postgres to invalidate a
> replication slot if it is blocking VACUUM. Also, it is often tricky to
> choose a default value for max_slot_wal_keep_size, because the amount
> of WAL that gets generated and allocated storage for the database can
> vary.
>
> Therefore, it is often easy for developers to do the following:
> a) set an XID age (age of slot's xmin or catalog_xmin) of say 1 or 1.5
> billion, after which the slots get invalidated.
> b) set a timeout of say 1 or 2 or 3 days, after which the inactive
> slots get invalidated.
>
> To implement (a), postgres needs a new GUC called max_slot_xid_age.
> The checkpointer then invalidates all the slots whose xmin (the oldest
> transaction that this slot needs the database to retain) or
> catalog_xmin (the oldest transaction affecting the system catalogs
> that this slot needs the database to retain) has reached the age
> specified by this setting.
>
> To implement (b), first postgres needs to track the replication slot
> metrics like the time at which the slot became inactive (inactive_at
> timestamptz) and the total number of times the slot became inactive in
> its lifetime (inactive_count numeric) in ReplicationSlotPersistentData
> structure. And, then it needs a new timeout GUC called
> inactive_replication_slot_timeout. Whenever a slot becomes inactive,
> the current timestamp and inactive count are stored in
> ReplicationSlotPersistentData structure and persisted to disk. The
> checkpointer then invalidates all the slots that are lying inactive
> for about inactive_replication_slot_timeout duration starting from
> inactive_at.
>
> In addition to implementing (b), these two new metrics enable
> developers to improve their monitoring tools as the metrics are
> exposed via pg_replication_slots system view. For instance, one can
> build a monitoring tool that signals when replication slots are lying
> inactive for a day or so using inactive_at metric, and/or when a
> replication slot is becoming inactive too frequently using inactive_at
> metric.
>
> I’m attaching the v1 patch set as described below:
> 0001 - Tracks invalidation_reason in pg_replication_slots. This is
> needed because slots now have multiple reasons for slot invalidation.
> 0002 - Tracks inactive replication slot information inactive_at and
> inactive_timeout.
> 0003 - Adds inactive_timeout based replication slot invalidation.
> 0004 - Adds XID based replication slot invalidation.
>
> Thoughts?

Needed a rebase due to c393308b. Please find the attached v2 patch set.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

pgsql-hackers by date:

Previous
From: "Tristan Partin"
Date:
Subject: Re: make dist using git archive
Next
From: "David G. Johnston"
Date:
Subject: Re: psql: add \create_function command