On Fri Sep 5, 2025 at 9:56 AM -03, Arseniy Mukhin wrote:
>> Ok, I agree that that this may happen but I don't see this as a common
>> case to fix the issue based on this behaviour. I think that we check the
>> transaction status also to skip notifications that were added on the
>> queue by transactions that are not fully committed yet, and I see this
>> scenario as a more common case but I could be wrong.
>>
>
> IIUC we don't need clog to check if the notification transaction is
> still in progress, we use a snapshot for that (XidInMVCCSnapshot()).
> If we see that the notification transaction is still in progress, we
> postpone processing of that notification. If we see that notification
> transaction is not in progress, then we check its status with
> TransactionIdDidCommit() (using clog) and only then fail if clog
> doesn't have notification transcation data (as a quick check, we can
> see in the stack trace that the failure occurs during the call to
> TransactionIdDidCommit, not XidInMVCCSnapshot).
>
> So we need to check notification transaction status only to understand
> if a transaction was committed and we can send notification or it was
> aborted (somewhere in between PreCommit_Notify and AtCommit_Notify)
> and we should skip notification. And the solution that Yura Sokolov
> suggested is to introduce a 'commited' flag, so we can check it
> instead of calling TransactionIdDidCommit. Another option is to make
> aborted transactions to clean up after themselves in AtAbort_Notify(),
> so if listener see notification in the queue and transaction that
> wrote it is not in progress, then such notification was always created
> by committed transaction and we also don't need to call
> TransactionIdDidCommit. One way for aborted transactions to clean up
> is to set its notifications dbOid to 0, so listeners skip them.
> Another option I think is to set queue HEAD back to the position where
> it was when the aborted transaction started to write to the queue. It
> also makes such aborted transaction notifications 'invisible' for
> listeners. We have a global lock and there is always only one writer,
> so it sounds possible. And there is another option which is the patch
> that Rishu Bagga is working on. In the patch all transactions write to
> the listen/notify queue only after they wrote a commit record to the
> WAL, so all notifications in the queue are always by committed
> transactions and we don't need to call TransactionIdDidCommit.
>
You are right, I missed the XidInMVCCSnapshot() call, sorry about that
and thanks very much for all the explanation!
I'll keep on hold the development of new patch versions for this thread
and focus on review and test the patch from Rishu at [1] to see if we
can make progress using the WAL approach.
[1] https://www.postgresql.org/message-id/CAK80%3DjipUfGC%2BUQSzeA4oCP9daRtHZGm2SQZWLxC9NWmVTDtRQ%40mail.gmail.com
--
Matheus Alcantara