Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue - Mailing list pgsql-hackers

From Joel Jacobson
Subject Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue
Date
Msg-id 7726d706-4a11-4747-900e-ea27f8de9b65@app.fastmail.com
Whole thread Raw
In response to Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue  ("Matheus Alcantara" <matheusssilv97@gmail.com>)
List pgsql-hackers
On Wed, Oct 22, 2025, at 02:16, Matheus Alcantara wrote:
>> Regarding the v8 patch, it introduces a fundamentally new way of
>> managing notification entries (adding entries with 'committed' state
>> and marking them 'aborted' in abort paths). This affects all use
>> cases, not just those involving very old unconsumed notifications, and
>> could introduce more serious bugs like PANIC or SEGV. For
>> backpatching, I prefer targeting just the problematic behavior while
>> leaving unrelated parts unchanged. Though Álvaro might have a
>> different perspective on this.
>>
> Thanks very much for this explanation and for what you've previously
> wrote on [1]. It's clear to me now that the v8 architecture is not a
> good way to go.

How about doing some more work in vac_update_datfrozenxid()?

Pseudo-code sketch:

```
void
vac_update_datfrozenxid(void)
{

    /* After computing newFrozenXid from all known sources... */

    TransactionId oldestNotifyXid = GetOldestQueuedNotifyXid();

    if (TransactionIdIsValid(oldestNotifyXid) &&
        TransactionIdPrecedes(oldestNotifyXid, newFrozenXid))
    {
        /*
         * The async queue has XIDs older than our proposed freeze point.
         * Attempt cleanup, then back off and let the next VACUUM benefit.
         */

        if (asyncQueueHasListeners())
        {
            /*
             * Wake all listening backends across *all* databases
             * that are not already at QUEUE_HEAD.
             * They'll hopefully process notifications and advance
             * their pointers, allowing the next VACUUM to freeze further.
             */
            asyncQueueWakeAllListeners();
        }
        else
        {
            /*
             * No listeners exist - discard all unread notifications.
             * The next VACUUM should succeed in advancing datfrozenxid.
             * asyncQueueAdvanceTailNoListeners() would take exclusive lock
             * on NotifyQueueLock before checking
             * QUEUE_FIRST_LISTENER == INVALID_PROC_NUMBER
             */
            asyncQueueAdvanceTailNoListeners();
        }

        /*
         * Back off datfrozenxid to protect the old XIDs.
         * The cleanup we just performed should allow the next VACUUM
         * to freeze further.
         */
        newFrozenXid = oldestNotifyXid;
    }
}
```

Maybe it wouldn't solve all problematic situations, but to me it seems
like these measures could help many of them, or am I missing some
crucial insight here?

/Joel



pgsql-hackers by date:

Previous
From: Peter Smith
Date:
Subject: Re: Should we say "wal_level = logical" instead of "wal_level >= logical"
Next
From: "Matt Smith (matts3)"
Date:
Subject: Meson install warnings when running postgres build from a sandbox