Hi,
On Wed, Jun 25, 2025 at 10:29 AM Noah Misch <noah@leadboat.com> wrote:
>
> On Sat, Jun 05, 2021 at 05:25:39PM -0400, Tom Lane wrote:
> >
> > ISTM the interesting question here is what was holding back truncation
> > of the NOTIFY queue. Could there have been an open transaction somewhere
> > that was failing to collect NOTIFY data?
>
> Open transactions always decline to collect notify data, don't they? See
> ProcessNotifyInterrupt().
>
> > But a transaction sitting open
> > for a month is likely to cause far more severe problems than that one.
>
> True.
>
One of our clients faced this problem, and I found out that long
transactions may have nothing to do with it.
We have the following logic in the notify queue :
If there are no listeners within all databases, and we are calling
LISTEN, then we must iterate from 'tail' to 'head' of the queue and
check statuses of transactions (see Exec_ListenPreCommit).
If there is a pruned-away xid in the queue, we will try to access its
status and get an error.
Because the tail of the queue is not necessarily always advanced
forward by the listeners, we can get such error without any long lived
transactions.
> > > We could prevent the trouble if vac_truncate_clog() had access to the oldest
> > > xid in the notification queue; it would set frozenXID to that value if
> > > frozenXID would otherwise be older.
> >
> > Perhaps. I'm not sure how hard it is to extract the oldest xid in the
> > queue (where "oldest" is defined as "numerically smallest"). The entries
> > are in xid commit order which is a different thing.
>
> Yeah, it wouldn't be cheap in the general case. The value could be a field in
> pg_control, updated by a separate VACUUM NOTIFY, which autovacuum would also
> run roughly as often as autovacuum visits template0.
As a temporary solution, which almost completely eliminates the
possibility of such a situation, I suggest adding a
AsyncQueueAdvanceTail call to vacuum (inside vac_update_datfrozenxid).
I mean 'static asyncQueueAdvanceTail()' that should be made
'external'. Thus, we can get rid of the problem that I described
above.
But there is one more : if the listener client is lagging, the
AsyncQueueAdvanceTail call will not save us (because it cannot advance
tail any further than the listener's position in the queue).
Again, this may be due to a very high load, but not because the client
keeps the transaction open for a very long time.
The best solution is to teach vacuum to recognize the minimum xid in
constant time, but I didn't come up with any sane implementations.
What do you think?
--
Best regards,
Daniil Davydov
Postgres Professional