Hi,
On Tue, Mar 24, 2026 at 11:50 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> > Please find the v3 patch for further review.
>
> Thank you for updating the patch. I think the patch is reasonably
> simple and can avoid unnecessary overheads well due to XID-based
> checks. Here are some comments:
Thank you for reviewing the patch.
> vacuum_get_cutoff() is also called by VACUUM FULL, CLUSTER, and
> REPACK. I'm not sure that users would expect the slot invalidation
> also in these commands. I think it's better to leave
> vacuum_get_cutoff() a pure cutoff computation function and we can try
> to invalidate slots in heap_vacuum_rel(). It requires additional
> ReadNextTransactionId() but we can live with it, or we can make
> vacuum_get_cutoffs() return the nextXID as well (stored in *cutoffs).
+1. I chose to perform the slot invalidation in heap_vacuum_rel by
getting the next txn ID and calling vacuum_get_cutoffs again when a
slot gets invalidated. IMHO, this is simple than adding a flag and do
the invalidation selectively in vacuum_get_cutoffs.
> if (TransactionIdPrecedes(oldestXmin, cutoffXID))
> + {
> + invalidated = InvalidateObsoleteReplicationSlots(RS_INVAL_XID_AGE,
> + 0,
> + InvalidOid,
> + InvalidTransactionId,
> + nextXID);
> + }
>
> I think it's better to check the procArray->replication_slot_xmin and
> procArray->replication_slot_catalog_xmin before iterating over each
> slot. Otherwise, we would end up checking every slot even when a long
> running transaction holds the oldestxmin back.
+1. Changed.
> + if (!TransactionIdIsNormal(cutoffXID))
> + cutoffXID = FirstNormalTransactionId;
>
> These codes have the same comment but are doing a slightly different
> thing. I guess the latter is missing '-'?
Fixed the typo.
I fixed a test error being reported in CI.
Please find the attached v4 patch for further review.
I've also attached the 0002 patch that adds a test case to demo a
production-like scenario by pushing the database to XID wraparound
limits and checking if the XID-age based invalidation with the GUC
setting at the default vacuum_failsafe_age of 1.6B works correctly,
and whether autovacuum can successfully remove this replication slot
blocker to proceed with freezing and bring the database back to
normal. I don't intend to get this committed unless others think
otherwise, but I wanted to have this as a reference.
--
Bharath Rupireddy
Amazon Web Services: https://aws.amazon.com