On Mon, Mar 11, 2024 at 3:44 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> Yes, your understanding is correct. I wanted us to consider having new
> parameters like 'inactive_replication_slot_timeout' to be at
> slot-level instead of GUC. I think this new parameter doesn't seem to
> be the similar as 'max_slot_wal_keep_size' which leads to truncation
> of WAL at global and then invalidates the appropriate slots. OTOH, the
> 'inactive_replication_slot_timeout' doesn't appear to have a similar
> global effect.
last_inactive_at is tracked for each slot using which slots get
invalidated based on inactive_replication_slot_timeout. It's like
max_slot_wal_keep_size invalidating slots based on restart_lsn. In a
way, both are similar, right?
> The other thing we should consider is what if the
> checkpoint happens at a timeout greater than
> 'inactive_replication_slot_timeout'?
In such a case, the slots get invalidated upon the next checkpoint as
the (current_checkpointer_timeout - last_inactive_at) will then be
greater than inactive_replication_slot_timeout.
> Shall, we consider doing it via
> some other background process or do we think checkpointer is the best
> we can have?
The same problem exists if we do it with some other background
process. I think the checkpointer is best because it already
invalidates slots for wal_removed cause, and flushes all replication
slots to disk. Moving this new invalidation functionality into some
other background process such as autovacuum will not only burden that
process' work but also mix up the unique functionality of that
background process.
Having said above, I'm open to ideas from others as I'm not so sure if
there's any issue with checkpointer invalidating the slots for new
reasons.
--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com