Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue
Date
Msg-id CAD21AoAXnVEegOrqj84rg_iRr15-UEOpn_vxZFOBfMEPMcOFEA@mail.gmail.com
Whole thread Raw
In response to Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue  ("Matheus Alcantara" <matheusssilv97@gmail.com>)
Responses Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue
List pgsql-hackers
On Tue, Oct 21, 2025 at 4:16 PM Matheus Alcantara
<matheusssilv97@gmail.com> wrote:
>
> On 21/10/25 18:42, Masahiko Sawada wrote:
> > On Mon, Oct 20, 2025 at 11:19 AM Matheus Alcantara
> > <matheusssilv97@gmail.com> wrote:
> >>
> >> On Mon Oct 20, 2025 at 11:18 AM -03, Álvaro Herrera wrote:
> >>> On 2025-Oct-20, Matheus Alcantara wrote:
> >>>
> >>>> This is similar to what was already proposed at [1]. This approach was
> >>>> abandoned because a notification on the queue may block datfrozenxid
> >>>> advance and clog truncation which can cause other issues for the users [2].
> >>>
> >>> Well, I think that this is the right solution for backpatching, and that
> >>> you were wrong to abandon it.  You can continue to design a better
> >>> mechanism for the master branch, but in old branches we cannot really do
> >>> all those things you're proposing to do.
> >>>
> >> I actually would prefer this approach TBH, but since this can cause
> >> other issues like transaction wraparound due to not consumed
> >> notifications we would need other mechanisms to prevent that and I'm not
> >> sure if users should expect this kind of behavior changes on minor
> >> version updates?
> >
> > True, unconsumed notifications could cause transaction wraparound by
> > preventing datfrozenxid from advancing. However, this risk only
> > applies when users have long-term unconsumed notifications, which is
> > uncommon. That said, we should note that, as I mentioned
> > previously[1], a process can accumulate unconsumed notifications
> > simply by being in idle-in-transaction state, even without
> > backend_xmin and backend_xid, which prevents datfrozenxid from
> > advancing. While this might not be problematic in practice if it's
> > rare, I find it concerning that we have no way to check the age of
> > unconsumed notifications.
> >
> Ok, I think that I was too conservative when thinking about the
> transaction wraparound issue that it could happen. I agree that this
> seems a uncommon scenario.
>
> >> I think that to go with this solution we would need some way to drop too
> >> old notifications from the queue to advance the datfrozenxid, so I
> >> imagine that we would need some GUC to make this configurable and we can
> >> configure a default value of course but some use cases may not be the
> >> best configuration, this is something that users should expected to deal
> >> on minor version updates?
> >
> > I think adding a new GUC would be overkill for this fix. As for
> > dropping old notifications from the queue, we probably don't need to
> > make it configurable - we could simply drop notifications whose commit
> > status is no longer available (instead of raising an error).
> >
> IIUC this is about not making the vacuum freeze considering the oldest
> xid on the queue but just remove notifications whose transaction status
> is no longer available right? Since currently when the error happens we
> already can't process the notifications it seems a reasonable way to go
> IMO.

On second thought, simply hiding the error would be worse than our
current behavior. Users wouldn't know their notifications are being
dropped, as they often don't check WARNINGs. The more frequently they
try to freeze XIDs, the more notifications they'd lose. To avoid
silent discards, they would need to increase
autovacuum_vacuum_max_freeze_age to accommodate more clog entries, but
this increases the risk of XID wraparound. I think the proposed
approach modifying the vacuum freeze to consider the oldest XID on the
queue would be better. This has a downside as I mentioned: processes
in idle-in-transaction state even without backend_xmin and backend_xid
can still accumulate unconsumed notifications. However, leaving
transactions in idle-in-transaction state for a long time is bad
practice anyway. While we might want to consider adding a safeguard
for this case, I guess it would rarely occur in practice.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: David Rowley
Date:
Subject: Re: Add RESPECT/IGNORE NULLS and FROM FIRST/LAST options
Next
From: Michael Paquier
Date:
Subject: Re: CI: Add task that runs pgindent