Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue - Mailing list pgsql-hackers
From | Arseniy Mukhin |
---|---|
Subject | Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue |
Date | |
Msg-id | CAE7r3M+7ZBXuo5Rnvm9RsR8qZewxCTH7LfTTCmtJmri4kFs-EQ@mail.gmail.com Whole thread Raw |
In response to | Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue (Masahiko Sawada <sawada.mshk@gmail.com>) |
List | pgsql-hackers |
On Tue, Oct 21, 2025 at 4:32 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Mon, Oct 20, 2025 at 5:37 AM Matheus Alcantara > <matheusssilv97@gmail.com> wrote: > > > > On Sat Oct 18, 2025 at 2:43 AM -03, Joel Jacobson wrote: > > > On Fri, Oct 17, 2025, at 22:50, Arseniy Mukhin wrote: > > > What a funny coincidence that the approach in this patch, > > > has one similarity with the "Direct advancement" approach > > > in the patch in the "Optimize LISTEN/NOTIFY" [1] thread, > > > namely that we're both interested in QUEUE_HEAD before/after > > > we push the notifications into the queue, in PreCommit_Notify(). > > > > > > It looks to me like our new data structures are Interchangeable, > > > so I guess we probably want both patches to eventually settle > > > on one and the same? > > > > > > The differences I note between our queue head before/after code are: > > > > > > - In this patch, you are palloc'ing a struct with two fields. > > > In [1], we're using two separate static QueuePosition variables. > > > > > > - In this patch, you are taking/releasing a shared lock before/after > > > the loop to read QUEUE_HEAD and set previousHead/head. > > > In [1], we avoid the need of the shared lock, by doing the reads > > > within the existing exclusive lock inside the loop, but instead > > > therefore need a firstIteration bool, to know which is the first > > > iteration, and need to overwrite the after-var in each iteration. > > > > > > I don't think the noted differences above matter, both seems fine. > > > > > Yeah, I also think that both approach seems fine. I keep the v8 version > > with the palloc, if someone has any concern about this I'm open to > > switch to another approach. > > > > > Another thing I noticed in your patch that made me wonder, > > > is the naming of the new AsyncQueueEntry bool field, > > > which is given the name "committed". > > > > > > I think this name is not entirely faithful, since when set to true, > > > the entry has not been committed yet. > > > > > > How about negating the meaning of this boolean field? > > > To instead indicate when the entry has been rollbacked. > > > Then, it would clearly communicate just that. > > > > > > Maybe naming it something like "rollbacked" or "aborted"? > > > > > Good point. I've renamed this field on the attached v8 version. > > I've reviewed the v8 patch and I'm not sure it's a bullet-proof > approach. The basic idea of the v8 patch is to add async entries with > rollbacked=false and we set rollbacked=true in > asyncQueueRollbackNotifications called from AtAbort_Notify() if an > error happens between PreCommit_Notify() and AtCommit_Notify(). The > asyncQueueRollbackNotifications() reads SLRU pages to mark entries > 'rollbacked' but reading SLRU pages could fail for some reason. In > this case, we would end up with a PANIC for recursive errors, or even > if we skip marking entries we would end up leaving entries as > committed. Also, if there are a lot of notifications across multiple > SLRU pages we need to mark as 'rollbacked', > asyncQueueRollbackNotifications() could take time to complete but it's > not interruptible. I don't think it's a good idea to introduce such an > operation in abort paths. > Thanks for the explanation! Now I see why using AtAbort_Notify() was a bad idea. It might be possible to solve the second problem and make the amount of work in AtAbort_Notify independent of the number of notifications. But I have no idea how to solve the SLRU page issue. Best regards, Arseniy Mukhin
pgsql-hackers by date: