Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue
Date
Msg-id CAD21AoCFZxXCBy+5DoarfG9LC9VdNwWRDpDHE5sdTh5Ym0EcqQ@mail.gmail.com
Whole thread Raw
In response to Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue  ("Matheus Alcantara" <matheusssilv97@gmail.com>)
Responses Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue
List pgsql-hackers
On Mon, Oct 20, 2025 at 5:37 AM Matheus Alcantara
<matheusssilv97@gmail.com> wrote:
>
> On Sat Oct 18, 2025 at 2:43 AM -03, Joel Jacobson wrote:
> > On Fri, Oct 17, 2025, at 22:50, Arseniy Mukhin wrote:
> > What a funny coincidence that the approach in this patch,
> > has one similarity with the "Direct advancement" approach
> > in the patch in the "Optimize LISTEN/NOTIFY" [1] thread,
> > namely that we're both interested in QUEUE_HEAD before/after
> > we push the notifications into the queue, in PreCommit_Notify().
> >
> > It looks to me like our new data structures are Interchangeable,
> > so I guess we probably want both patches to eventually settle
> > on one and the same?
> >
> > The differences I note between our queue head before/after code are:
> >
> > - In this patch, you are palloc'ing a struct with two fields.
> >   In [1], we're using two separate static QueuePosition variables.
> >
> > - In this patch, you are taking/releasing a shared lock before/after
> >   the loop to read QUEUE_HEAD and set previousHead/head.
> >   In [1], we avoid the need of the shared lock, by doing the reads
> >   within the existing exclusive lock inside the loop, but instead
> >   therefore need a firstIteration bool, to know which is the first
> >   iteration, and need to overwrite the after-var in each iteration.
> >
> > I don't think the noted differences above matter, both seems fine.
> >
> Yeah, I also think that both approach seems fine. I keep the v8 version
> with the palloc, if someone has any concern about this I'm open to
> switch to another approach.
>
> > Another thing I noticed in your patch that made me wonder,
> > is the naming of the new AsyncQueueEntry bool field,
> > which is given the name "committed".
> >
> > I think this name is not entirely faithful, since when set to true,
> > the entry has not been committed yet.
> >
> > How about negating the meaning of this boolean field?
> > To instead indicate when the entry has been rollbacked.
> > Then, it would clearly communicate just that.
> >
> > Maybe naming it something like "rollbacked" or "aborted"?
> >
> Good point. I've renamed this field on the attached v8 version.

I've reviewed the v8 patch and I'm not sure it's a bullet-proof
approach. The basic idea of the v8 patch is to add async entries with
rollbacked=false and we set rollbacked=true in
asyncQueueRollbackNotifications called from AtAbort_Notify() if an
error happens between PreCommit_Notify() and AtCommit_Notify(). The
asyncQueueRollbackNotifications() reads SLRU pages to mark entries
'rollbacked' but reading SLRU pages could fail for some reason. In
this case, we would end up with a PANIC for recursive errors, or even
if we skip marking entries we would end up leaving entries as
committed. Also, if there are a lot of notifications across multiple
SLRU pages we need to mark as 'rollbacked',
asyncQueueRollbackNotifications() could take time to complete but it's
not interruptible. I don't think it's a good idea to introduce such an
operation in abort paths.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Sami Imseih
Date:
Subject: Re: Skip unregistered custom kinds on stats load
Next
From: Michael Paquier
Date:
Subject: Re: Skip unregistered custom kinds on stats load