Thread: BUG #16961: Could not access status of transaction
The following bug has been logged on the website: Bug reference: 16961 Logged by: Sergey Zhuravlev Email address: sergii.zhuravlev@smartnet.ua PostgreSQL version: 13.2 Operating system: CentOS Linux release 7.9.2009 (Core) Description: Hello Command - LISTEN missed_trades_empty_instrument " ERROR: could not access status of transaction 1954017648 DETAIL: Could not open file "pg_xact/0747": No such file or directory. STATEMENT: LISTEN missed_trades_empty_instrument " Current transaction # select txid_current(); txid_current -------------- 6985716158 Parameter - autovacuum_freeze_max_age = 200000000
Any other ideas?
Thanks!
From: "PG Bug reporting form" <noreply@postgresql.org>
Date: 13 April 2021, 18:50:26
The following bug has been logged on the website: Bug reference: 16961 Logged by: Sergey Zhuravlev Email address: sergii.zhuravlev@smartnet.ua PostgreSQL version: 13.2 Operating system: CentOS Linux release 7.9.2009 (Core) Description: Hello Command - LISTEN missed_trades_empty_instrument " ERROR: could not access status of transaction 1954017648 DETAIL: Could not open file "pg_xact/0747": No such file or directory. STATEMENT: LISTEN missed_trades_empty_instrument " Current transaction # select txid_current(); txid_current -------------- 6985716158 Parameter - autovacuum_freeze_max_age = 200000000
The database was rebooted and the issue disappeared
Hopefully it was one time issue.
From: Stepan Yankevych <stepya@ukr.net>
Sent: Wednesday, April 14, 2021 7:12 PM
To: sergii.zhuravlev@smartnet.ua; pgsql-bugs@lists.postgresql.org
Cc: sergii.zhuravlev@smartnet.ua
Subject: Re: BUG #16961: Could not access status of transaction
Hi Guys!
Let me clarify few things things about the issue.
The issue happening each morning when application starts on the production DataBase during about a month.
Always the same transaction id is mentioned in the error (1954017648)
We tried to do UNLISTEN - no changes. the same issue.
LISTEN works good for any other channels.
Can it be related to some hanged transaction? 1954017648? (for example while some network interruption)
Is it possible to kill/clean it somehow without DB restart?
Can it be related to some non-vacuumed system table or so?
Any other ideas?
Thanks!
--- Original message ---
From: "PG Bug reporting form" <noreply@postgresql.org>
Date: 13 April 2021, 18:50:26
The following bug has been logged on the website:Bug reference: 16961Logged by: Sergey ZhuravlevEmail address: sergii.zhuravlev@smartnet.uaPostgreSQL version: 13.2Operating system: CentOS Linux release 7.9.2009 (Core)Description:HelloCommand - LISTEN missed_trades_empty_instrument"ERROR: could not access status of transaction 1954017648DETAIL: Could not open file "pg_xact/0747": No such file or directory.STATEMENT: LISTEN missed_trades_empty_instrument"Current transaction# select txid_current();txid_current--------------6985716158Parameter - autovacuum_freeze_max_age = 200000000
On Mon, Apr 19, 2021 at 06:49:15PM +0000, Stepan Yankevych wrote: > The database was rebooted and the issue disappeared Each postmaster restart clears the LISTEN/NOTIFY queue, so that fits. > The issue happening each morning when application starts on the production DataBase during about a month. > Always the same transaction id is mentioned in the error (1954017648) > We tried to do UNLISTEN - no changes. the same issue. > LISTEN works good for any other channels. > > > Can it be related to some hanged transaction? 1954017648? (for example while some network interruption) Something like that; see below. > Is it possible to kill/clean it somehow without DB restart? Not to my knowledge. > Can it be related to some non-vacuumed system table or so? Probably not. > Command - LISTEN missed_trades_empty_instrument > > ERROR: could not access status of transaction 1954017648 > DETAIL: Could not open file "pg_xact/0747": No such file or directory. > STATEMENT: LISTEN missed_trades_empty_instrument The LISTEN/NOTIFY queue stores a transaction id for each notification (internally, each AsyncQueueEntry). I can imagine the "could not access status" happening if a sequence of events like this happened since the last postmaster restart: backend 1: LISTEN missed_trades_empty_instrument backend 2: BEGIN; NOTIFY missed_trades_empty_instrument [TransactionId N] backend 1: BEGIN backend 2: COMMIT backend 1: CREATE TEMP TABLE x (); [sets TransactionId N+K] autovacuum: freezes tuples, deletes pg_xact data covering TransactionId N backend 1: COMMIT On the other hand, if that's what happened, your report that "LISTEN works good for any other channels" surprises me. Perhaps something completely different happened on your system. We could prevent the trouble if vac_truncate_clog() had access to the oldest xid in the notification queue; it would set frozenXID to that value if frozenXID would otherwise be older.
Noah Misch <noah@leadboat.com> writes: > On Mon, Apr 19, 2021 at 06:49:15PM +0000, Stepan Yankevych wrote: >> The issue happening each morning when application starts on the production DataBase during about a month. >> Always the same transaction id is mentioned in the error (1954017648) >> We tried to do UNLISTEN - no changes. the same issue. >> LISTEN works good for any other channels. > ... > On the other hand, if that's what happened, your report that "LISTEN works > good for any other channels" surprises me. Perhaps something completely > different happened on your system. I suspect the true state of affairs was more like "LISTEN works good in any other databases". The described symptoms are consistent with there being a message in the NOTIFY queue that has a pruned-away xid. An incoming listener would try to scan over already-committed notify messages, but testing to see whether this stale message is committed would fail. However, since asyncQueueProcessPageEntries just ignores messages not targeted for the current database, incoming listeners in other databases wouldn't notice the problem. ISTM the interesting question here is what was holding back truncation of the NOTIFY queue. Could there have been an open transaction somewhere that was failing to collect NOTIFY data? But a transaction sitting open for a month is likely to cause far more severe problems than that one. > We could prevent the trouble if vac_truncate_clog() had access to the oldest > xid in the notification queue; it would set frozenXID to that value if > frozenXID would otherwise be older. Perhaps. I'm not sure how hard it is to extract the oldest xid in the queue (where "oldest" is defined as "numerically smallest"). The entries are in xid commit order which is a different thing. regards, tom lane
On Sat, Jun 05, 2021 at 05:25:39PM -0400, Tom Lane wrote: > Noah Misch <noah@leadboat.com> writes: > > On Mon, Apr 19, 2021 at 06:49:15PM +0000, Stepan Yankevych wrote: > >> The issue happening each morning when application starts on the production DataBase during about a month. > >> Always the same transaction id is mentioned in the error (1954017648) > >> We tried to do UNLISTEN - no changes. the same issue. > >> LISTEN works good for any other channels. > > > ... > > On the other hand, if that's what happened, your report that "LISTEN works > > good for any other channels" surprises me. Perhaps something completely > > different happened on your system. > > I suspect the true state of affairs was more like "LISTEN works good in > any other databases". The described symptoms are consistent with there > being a message in the NOTIFY queue that has a pruned-away xid. An > incoming listener would try to scan over already-committed notify > messages, but testing to see whether this stale message is committed > would fail. However, since asyncQueueProcessPageEntries just ignores > messages not targeted for the current database, incoming listeners > in other databases wouldn't notice the problem. > > ISTM the interesting question here is what was holding back truncation > of the NOTIFY queue. Could there have been an open transaction somewhere > that was failing to collect NOTIFY data? Open transactions always decline to collect notify data, don't they? See ProcessNotifyInterrupt(). > But a transaction sitting open > for a month is likely to cause far more severe problems than that one. True. > > We could prevent the trouble if vac_truncate_clog() had access to the oldest > > xid in the notification queue; it would set frozenXID to that value if > > frozenXID would otherwise be older. > > Perhaps. I'm not sure how hard it is to extract the oldest xid in the > queue (where "oldest" is defined as "numerically smallest"). The entries > are in xid commit order which is a different thing. Yeah, it wouldn't be cheap in the general case. The value could be a field in pg_control, updated by a separate VACUUM NOTIFY, which autovacuum would also run roughly as often as autovacuum visits template0.