Noah Misch <noah@leadboat.com> writes:
> On Mon, Apr 19, 2021 at 06:49:15PM +0000, Stepan Yankevych wrote:
>> The issue happening each morning when application starts on the production DataBase during about a month.
>> Always the same transaction id is mentioned in the error (1954017648)
>> We tried to do UNLISTEN - no changes. the same issue.
>> LISTEN works good for any other channels.
> ...
> On the other hand, if that's what happened, your report that "LISTEN works
> good for any other channels" surprises me. Perhaps something completely
> different happened on your system.
I suspect the true state of affairs was more like "LISTEN works good in
any other databases". The described symptoms are consistent with there
being a message in the NOTIFY queue that has a pruned-away xid. An
incoming listener would try to scan over already-committed notify
messages, but testing to see whether this stale message is committed
would fail. However, since asyncQueueProcessPageEntries just ignores
messages not targeted for the current database, incoming listeners
in other databases wouldn't notice the problem.
ISTM the interesting question here is what was holding back truncation
of the NOTIFY queue. Could there have been an open transaction somewhere
that was failing to collect NOTIFY data? But a transaction sitting open
for a month is likely to cause far more severe problems than that one.
> We could prevent the trouble if vac_truncate_clog() had access to the oldest
> xid in the notification queue; it would set frozenXID to that value if
> frozenXID would otherwise be older.
Perhaps. I'm not sure how hard it is to extract the oldest xid in the
queue (where "oldest" is defined as "numerically smallest"). The entries
are in xid commit order which is a different thing.
regards, tom lane