Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue - Mailing list pgsql-hackers

From Matheus Alcantara
Subject Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue
Date
Msg-id DCJGBRB9RUV4.39SNC2UAKVCG3@gmail.com
Whole thread Raw
In response to Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue  (Jacques Combrink <jacques@quantsolutions.co.za>)
List pgsql-hackers
On Mon Sep 1, 2025 at 11:06 AM -03, Jacques Combrink wrote:
> TLDR:
> active listener on one database causes notify on another database to get
> stuck.
> At no point could I get a stuck notify if I don't have a listener on at
> least one other database than the one I am notifying on. See the Extra
> weirdness section.
> At no point do you need to have any other queries running, there is
> never an idle in transaction query needed for bad timing with the vacuum.
>
> I hope I explained everything well enough so that one of you smart
> people can find and fix the problem.
>
The  long running transaction steps is just an example that we can lose
notifications using the first patch from Daniil that Alex has shared on
[1]. The steps that you've shared is just another way to trigger the
issue but it's similar to the steps that Alex also shared on [1].

All these different ways to trigger the error face the same underlying
problem: If a notification is keep for too long on the queue that vacuum
freeze can run and truncate clog files that contains transaction
information of this notification the error will happen.

The patch that I've attached on [2] aims to fix the issue following the
steps that you've shared, but during the tests I've found a stack
overflow bug on AsyncQueueIterNextNotification() due to the number of
notifications. I'm attaching a new version that fix this bug and I tried
to reproduce your steps with this new version and the issue seems to be
fixed.

Note that notifications that were added without any previous LISTEN will
block the xid advance during VACUUM FREEZE until we have a listener on
the database that owns these notifications. The XXX comment on vacuum.c
is about this problem.

[1] https://www.postgresql.org/message-id/CAK98qZ3wZLE-RZJN_Y%2BTFjiTRPPFPBwNBpBi5K5CU8hUHkzDpw%40mail.gmail.com
[2] https://www.postgresql.org/message-id CAFY6G8cJm73_MM9SuynZUqtqcaTuepUDgDuvS661oLW7U0dgsg%40mail.gmail.com

--
Matheus Alcantara

Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Use merge-based matching for MCVs in eqjoinsel
Next
From: Matheus Alcantara
Date:
Subject: Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue