Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue - Mailing list pgsql-hackers
From | Matheus Alcantara |
---|---|
Subject | Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue |
Date | |
Msg-id | DDODT698DICW.1OMV62ZFTGAUG@gmail.com Whole thread Raw |
In response to | Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue (Masahiko Sawada <sawada.mshk@gmail.com>) |
Responses |
Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue
Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue |
List | pgsql-hackers |
On 21/10/25 18:42, Masahiko Sawada wrote: > On Mon, Oct 20, 2025 at 11:19 AM Matheus Alcantara > <matheusssilv97@gmail.com> wrote: >> >> On Mon Oct 20, 2025 at 11:18 AM -03, Álvaro Herrera wrote: >>> On 2025-Oct-20, Matheus Alcantara wrote: >>> >>>> This is similar to what was already proposed at [1]. This approach was >>>> abandoned because a notification on the queue may block datfrozenxid >>>> advance and clog truncation which can cause other issues for the users [2]. >>> >>> Well, I think that this is the right solution for backpatching, and that >>> you were wrong to abandon it. You can continue to design a better >>> mechanism for the master branch, but in old branches we cannot really do >>> all those things you're proposing to do. >>> >> I actually would prefer this approach TBH, but since this can cause >> other issues like transaction wraparound due to not consumed >> notifications we would need other mechanisms to prevent that and I'm not >> sure if users should expect this kind of behavior changes on minor >> version updates? > > True, unconsumed notifications could cause transaction wraparound by > preventing datfrozenxid from advancing. However, this risk only > applies when users have long-term unconsumed notifications, which is > uncommon. That said, we should note that, as I mentioned > previously[1], a process can accumulate unconsumed notifications > simply by being in idle-in-transaction state, even without > backend_xmin and backend_xid, which prevents datfrozenxid from > advancing. While this might not be problematic in practice if it's > rare, I find it concerning that we have no way to check the age of > unconsumed notifications. > Ok, I think that I was too conservative when thinking about the transaction wraparound issue that it could happen. I agree that this seems a uncommon scenario. >> I think that to go with this solution we would need some way to drop too >> old notifications from the queue to advance the datfrozenxid, so I >> imagine that we would need some GUC to make this configurable and we can >> configure a default value of course but some use cases may not be the >> best configuration, this is something that users should expected to deal >> on minor version updates? > > I think adding a new GUC would be overkill for this fix. As for > dropping old notifications from the queue, we probably don't need to > make it configurable - we could simply drop notifications whose commit > status is no longer available (instead of raising an error). > IIUC this is about not making the vacuum freeze considering the oldest xid on the queue but just remove notifications whose transaction status is no longer available right? Since currently when the error happens we already can't process the notifications it seems a reasonable way to go IMO. >> Going with the "self contained" idea sound more easier to backpatch >> actually, so this is the main reason that I abandoned this other >> approach. Could you please point what make the v8 version not visible >> for bachpatching? > > Regarding the v8 patch, it introduces a fundamentally new way of > managing notification entries (adding entries with 'committed' state > and marking them 'aborted' in abort paths). This affects all use > cases, not just those involving very old unconsumed notifications, and > could introduce more serious bugs like PANIC or SEGV. For > backpatching, I prefer targeting just the problematic behavior while > leaving unrelated parts unchanged. Though Álvaro might have a > different perspective on this. > Thanks very much for this explanation and for what you've previously wrote on [1]. It's clear to me now that the v8 architecture is not a good way to go. [1] https://www.postgresql.org/message-id/CAD21AoCFZxXCBy%2B5DoarfG9LC9VdNwWRDpDHE5sdTh5Ym0EcqQ%40mail.gmail.com -- Matheus Alcantara
pgsql-hackers by date: