Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue - Mailing list pgsql-hackers

From Matheus Alcantara
Subject Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue
Date
Msg-id DD0JHP2L085C.19I49XELORB20@gmail.com
Whole thread Raw
In response to Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue  (Arseniy Mukhin <arseniy.mukhin.dev@gmail.com>)
Responses Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue
List pgsql-hackers
On Tue Sep 23, 2025 at 1:11 PM -03, Arseniy Mukhin wrote:
>> Thanks for the explanation! I'm just not sure if I understand why do we
>> need the LWLockAcquire(NotifyQueueLock, LW_EXCLUSIVE) on
>> PreCommit_Notify() if we already have the
>> LockSharedObject(DatabaseRelationId, InvalidOid, 0, AccessExclusiveLock);
>>
>
> Good question. It seems that it would be enough to hold
> NotifyQueueLock only during the head update, so I don't understand it
> either.
>
IIUC correctly we acquire the LockSharedObject(DatabaseRelationId,
InvalidOid, 0, AccessExclusiveLock) to make other COMMIT's wait and the
LWLockAcquire(NotifyQueueLock, LW_EXCLUSIVE) to make listener backends
wait while we are adding the entries on the queue.

> Thank you for the new version!
>
> The fix looks exactly how I thought about it. But while thinking about
> why we need to hold NotifyQueueLock in PreCommit_Notify I realized
> there is a concurrency bug in the 'head resetting' approach. I thought
> that listeners hold NotifyQueueLock lock while reading notifications
> from the queue, but now I see that they hold it only while reading the
> current head position. So we can end up in the next situation:
>
> There are 2 backends:  listener and writer (backend with notifications)
>
>
>         listener                                   writer
>   ----------------------------------------------------------------------
>                                          writer wants to commit, so it
>                                          adds notifications to the queue
>                                          in PreCommit_Notify() and advances
>                                          queue head.
>
>   ----------------------------------------------------------------------
> listener wants to read notifications. It
> gets into asyncQueueReadAllNotifications()
> and reads the current head (it already
> contains writer's notifications):
>
>
> asyncQueueReadAllNotifications()
>           ...
>     LWLockAcquire(NotifyQueueLock, LW_SHARED);
>     head = QUEUE_HEAD;
>     LWLockRelease(NotifyQueueLock);
>           ...
>
>   ----------------------------------------------------------------------
>                                          writer failed to commit, so it
>                                          resets queue head in AtAbort_Notify()
>                                          and completes the transaction.
>
>   ----------------------------------------------------------------------
> listener gets a snapshot where the writer
> is not in progress.
>
>         ...
>     snapshot = RegisterSnapshot(GetLatestSnapshot());
>         ...
>
>
> This way the listener reads the head that includes all writer's
> notifications and a snapshot where the writer is not in progress, so
> nothing stops the listener from sending these notifications and it's
> even possible to have the listener's position that is after the queue
> head, so yes, it's bad :( Sorry about that.
>
Yeah, this is bad. I'm wondering if we could reproduce such race
conditions scenarios with some TAP tests.

> Probably we can fix it (swap GetLatestSnapshot() with the head
> position reading for example) but now I think that 'committed' field
> approach is a more robust way to fix the bug. What do you think?
>
I also agree that the committed field seems a more safe approach.

> BTW one thing about 'committed' field fix version [0]. It seems that
> instead of remembering all notification positions, we can remember the
> head position after we acquire global lock and before we write
> anything to the queue and in case of abort we can just start from the
> saved position and mark all notifications until the end of the queue
> as 'committed = false'. The reason is the same as with resetting the
> head approach - as we hold global lock, no writers can add
> notifications to the queue, so when we in AtAbort_Notify() all
> notifications after the saved position are ours.
>
See the new attached version which implements this idea of using the
committed field approach. I was just a bit concenerd about a race
condition situation where the QUEUE_HEAD is changed by another publisher
process and just iterating over all entries from the saved previous
QUEUE_HEAD position until the current QUEUE_HEAD position we could mark
successfully committed notifications as not committed by mistake, so in
this new patch version I save the QUEUE_HEAD position before we add the
entries on the shared queue and also the QUEUE_HEAD position after these
entries are added so we ensure that we only process the entries of this
range although we have the global lock
LockSharedObject(DatabaseRelationId) that may prevent this situation.

What do you think?

--
Matheus Alcantara


Attachment

pgsql-hackers by date:

Previous
From: Jacob Champion
Date:
Subject: Re: libcurl in libpq.pc
Next
From: Michael Paquier
Date:
Subject: Re: Add support for entry counting in pgstats