On Wed Oct 22, 2025 at 1:31 AM -03, Masahiko Sawada wrote:
> On Tue, Oct 21, 2025 at 4:16 PM Matheus Alcantara
>> > I think adding a new GUC would be overkill for this fix. As for
>> > dropping old notifications from the queue, we probably don't need to
>> > make it configurable - we could simply drop notifications whose commit
>> > status is no longer available (instead of raising an error).
>> >
>> IIUC this is about not making the vacuum freeze considering the oldest
>> xid on the queue but just remove notifications whose transaction status
>> is no longer available right? Since currently when the error happens we
>> already can't process the notifications it seems a reasonable way to go
>> IMO.
>
> On second thought, simply hiding the error would be worse than our
> current behavior. Users wouldn't know their notifications are being
> dropped, as they often don't check WARNINGs. The more frequently they
> try to freeze XIDs, the more notifications they'd lose. To avoid
> silent discards, they would need to increase
> autovacuum_vacuum_max_freeze_age to accommodate more clog entries, but
> this increases the risk of XID wraparound. I think the proposed
> approach modifying the vacuum freeze to consider the oldest XID on the
> queue would be better. This has a downside as I mentioned: processes
> in idle-in-transaction state even without backend_xmin and backend_xid
> can still accumulate unconsumed notifications. However, leaving
> transactions in idle-in-transaction state for a long time is bad
> practice anyway. While we might want to consider adding a safeguard
> for this case, I guess it would rarely occur in practice.
>
I'm attaching a v9 patch which is based on the idea of changing the
vacuum freeze to consider the oldest xid on the listen/notify queue. The
0001 patch is from Joel that it was previously sent on [1] with some
small tweaks and the 0002 is the TAP tests introduced on the previously
versions by me and by Arseniy. I keep it separate because I'm not sure
if it's all suitable for back-pacthing.
I'm wondering if the 002_aborted_tx_notifies.pl is still needed with
this architecture being used. I think that it's not, but perhaps is a
good test to keep it?
[1] https://www.postgresql.org/message-id/25651193-da4e-4185-a564-f2efa6b0c8a4%40app.fastmail.com
--
Matheus Alcantara