[PATCH] Add prepared_orphaned_transaction_timeout GUC - Mailing list pgsql-hackers

From Nikhil Chawla
Subject [PATCH] Add prepared_orphaned_transaction_timeout GUC
Date
Msg-id CAAXajwDOvTwLQ=rO5hOKsR_VTikST1rN-moO46YhYEgsO00dqg@mail.gmail.com
Whole thread
Responses Re: [PATCH] Add prepared_orphaned_transaction_timeout GUC
List pgsql-hackers
Hi,

Orphaned prepared transactions cause escalating harm the longer they persist:

  1. Lock retention :  All locks (row-level, table-level, advisory) acquired during the transaction are held indefinitely, blocking concurrent DML and DDL
  2. VACUUM blockage : The prepared transaction's XID becomes the oldest running transaction, preventing VACUUM from freezing tuples or reclaiming dead rows across the entire cluster, leading to table and index bloat
  3. Transaction ID wraparound risk : In extreme cases, the frozen XID horizon cannot advance, eventually threatening XID wraparound shutdown
  4. Resource consumption : Shared memory slots (max_prepared_transactions) remain occupied; the WAL records for the prepared state persist
Today, the only remediation is manual intervention: a DBA must discover the orphan (via pg_prepared_xacts), determine it's truly abandoned, and issue ROLLBACK PREPARED. PostgreSQL already has timeout-based safety nets for other "stuck" session states such as, idle_in_transaction_session_timeout, idle_session_timeout, statement_timeout, but no equivalent for prepared transactions. This patch fills that gap.

How it works ?


CleanupOrphanedPreparedTransactions():

  Phase 1 — Collect candidates (under TwoPhaseStateLock, shared mode):
    for each GlobalTransactionData (gxact) in TwoPhaseState:
      if gxact->valid AND
         TimestampDifferenceExceeds(gxact->prepared_at, now, timeout):
        save gxact->gid to candidate list

  Phase 2 — Roll back each candidate (lock released):
    for each saved GID:
      lock = LockGXactForCleanup(gid)
      if lock succeeded:
        FinishPreparedTransaction(gid, isCommit=false)
        log: "rolling back orphaned prepared transaction %s"

Safety Properties

  1. Timeout = 0 (default): Feature is completely disabled, no behavior change from default PostgreSQL
  2. No false positives on active transactions: The check uses prepared_at, which is set once at PREPARE TRANSACTION time. A transaction that is actively being committed/rolled back by a client will either complete before the timeout or be skipped by LockGXactForCleanup (which returns NULL if the gxact is already locked by another backend)
  3. Crash-safe: If the checkpointer crashes during cleanup, the prepared transaction's WAL state is unchanged, it remains prepared and will be cleaned up after recovery
  4. Idempotent: If the GID was already resolved between Phase 1 and Phase 2, LockGXactForCleanup returns NULL and the cleanup is silently skipped

--
Regards,
Nikhil Chawla

Attachment

pgsql-hackers by date:

Previous
From: Jakub Wartak
Date:
Subject: Re: log XLogPrefetch stats at end of recovery
Next
From: Peter Eisentraut
Date:
Subject: Re: Trying out