Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue - Mailing list pgsql-hackers

From Arseniy Mukhin
Subject Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue
Date
Msg-id CAE7r3M+7ZBXuo5Rnvm9RsR8qZewxCTH7LfTTCmtJmri4kFs-EQ@mail.gmail.com
Whole thread Raw
In response to Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue  (Masahiko Sawada <sawada.mshk@gmail.com>)
List pgsql-hackers
On Tue, Oct 21, 2025 at 4:32 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Mon, Oct 20, 2025 at 5:37 AM Matheus Alcantara
> <matheusssilv97@gmail.com> wrote:
> >
> > On Sat Oct 18, 2025 at 2:43 AM -03, Joel Jacobson wrote:
> > > On Fri, Oct 17, 2025, at 22:50, Arseniy Mukhin wrote:
> > > What a funny coincidence that the approach in this patch,
> > > has one similarity with the "Direct advancement" approach
> > > in the patch in the "Optimize LISTEN/NOTIFY" [1] thread,
> > > namely that we're both interested in QUEUE_HEAD before/after
> > > we push the notifications into the queue, in PreCommit_Notify().
> > >
> > > It looks to me like our new data structures are Interchangeable,
> > > so I guess we probably want both patches to eventually settle
> > > on one and the same?
> > >
> > > The differences I note between our queue head before/after code are:
> > >
> > > - In this patch, you are palloc'ing a struct with two fields.
> > >   In [1], we're using two separate static QueuePosition variables.
> > >
> > > - In this patch, you are taking/releasing a shared lock before/after
> > >   the loop to read QUEUE_HEAD and set previousHead/head.
> > >   In [1], we avoid the need of the shared lock, by doing the reads
> > >   within the existing exclusive lock inside the loop, but instead
> > >   therefore need a firstIteration bool, to know which is the first
> > >   iteration, and need to overwrite the after-var in each iteration.
> > >
> > > I don't think the noted differences above matter, both seems fine.
> > >
> > Yeah, I also think that both approach seems fine. I keep the v8 version
> > with the palloc, if someone has any concern about this I'm open to
> > switch to another approach.
> >
> > > Another thing I noticed in your patch that made me wonder,
> > > is the naming of the new AsyncQueueEntry bool field,
> > > which is given the name "committed".
> > >
> > > I think this name is not entirely faithful, since when set to true,
> > > the entry has not been committed yet.
> > >
> > > How about negating the meaning of this boolean field?
> > > To instead indicate when the entry has been rollbacked.
> > > Then, it would clearly communicate just that.
> > >
> > > Maybe naming it something like "rollbacked" or "aborted"?
> > >
> > Good point. I've renamed this field on the attached v8 version.
>
> I've reviewed the v8 patch and I'm not sure it's a bullet-proof
> approach. The basic idea of the v8 patch is to add async entries with
> rollbacked=false and we set rollbacked=true in
> asyncQueueRollbackNotifications called from AtAbort_Notify() if an
> error happens between PreCommit_Notify() and AtCommit_Notify(). The
> asyncQueueRollbackNotifications() reads SLRU pages to mark entries
> 'rollbacked' but reading SLRU pages could fail for some reason. In
> this case, we would end up with a PANIC for recursive errors, or even
> if we skip marking entries we would end up leaving entries as
> committed. Also, if there are a lot of notifications across multiple
> SLRU pages we need to mark as 'rollbacked',
> asyncQueueRollbackNotifications() could take time to complete but it's
> not interruptible. I don't think it's a good idea to introduce such an
> operation in abort paths.
>

Thanks for the explanation! Now I see why using AtAbort_Notify() was a
bad idea. It might be possible to solve the second problem and make
the amount of work in AtAbort_Notify independent of the number of
notifications. But I have no idea how to solve the SLRU page issue.


Best regards,
Arseniy Mukhin



pgsql-hackers by date:

Previous
From: Nathan Bossart
Date:
Subject: Re: abi-compliance-check failure due to recent changes to pg_{clear,restore}_{attribute,relation}_stats()
Next
From: Álvaro Herrera
Date:
Subject: Re: Question for coverage report