Thread: BUG #16961: Could not access status of transaction

BUG #16961: Could not access status of transaction

From

PG Bug reporting form

Date:

13 April 2021, 15:43:52

The following bug has been logged on the website:

Bug reference:      16961
Logged by:          Sergey Zhuravlev
Email address:      sergii.zhuravlev@smartnet.ua
PostgreSQL version: 13.2
Operating system:   CentOS Linux release 7.9.2009 (Core)
Description:

Hello

Command  - LISTEN missed_trades_empty_instrument  
"
ERROR:  could not access status of transaction 1954017648
DETAIL:  Could not open file "pg_xact/0747": No such file or directory.
STATEMENT:  LISTEN missed_trades_empty_instrument
"
Current transaction
# select txid_current();
 txid_current
--------------
   6985716158

Parameter -  autovacuum_freeze_max_age = 200000000

Re: BUG #16961: Could not access status of transaction

From

Stepan Yankevych

Date:

14 April 2021, 16:11:41

Hi Guys!

Let me clarify few things things about the issue.

The issue happening each morning when application starts on the production DataBase during about a month.

Always the same transaction id is mentioned in the error (1954017648)

We tried to do UNLISTEN - no changes. the same issue.

LISTEN works good for any other channels.

Can it be related to some hanged transaction? 1954017648? (for example while some network interruption)

Is it possible to kill/clean it somehow without DB restart?

Can it be related to some non-vacuumed system table or so?

Any other ideas?

Thanks!

--- Original message ---
From: "PG Bug reporting form" <noreply@postgresql.org>
Date: 13 April 2021, 18:50:26

The following bug has been logged on the website:

Bug reference:      16961
Logged by:          Sergey Zhuravlev
Email address:      sergii.zhuravlev@smartnet.ua
PostgreSQL version: 13.2
Operating system:   CentOS Linux release 7.9.2009 (Core)
Description:        

Hello

Command  - LISTEN missed_trades_empty_instrument  
"
ERROR:  could not access status of transaction 1954017648
DETAIL:  Could not open file "pg_xact/0747": No such file or directory.
STATEMENT:  LISTEN missed_trades_empty_instrument
"
Current transaction
# select txid_current(); txid_current
--------------   6985716158

Parameter -  autovacuum_freeze_max_age = 200000000

RE: BUG #16961: Could not access status of transaction

From

Stepan Yankevych

Date:

19 April 2021, 18:49:15

The database was rebooted and the issue disappeared

Hopefully it was one time issue.

From: Stepan Yankevych <stepya@ukr.net>
Sent: Wednesday, April 14, 2021 7:12 PM
To: sergii.zhuravlev@smartnet.ua; pgsql-bugs@lists.postgresql.org
Cc: sergii.zhuravlev@smartnet.ua
Subject: Re: BUG #16961: Could not access status of transaction

Hi Guys!

Let me clarify few things things about the issue.

The issue happening each morning when application starts on the production DataBase during about a month.

Always the same transaction id is mentioned in the error (1954017648)

We tried to do UNLISTEN - no changes. the same issue.

LISTEN works good for any other channels.

Can it be related to some hanged transaction? 1954017648? (for example while some network interruption)

Is it possible to kill/clean it somehow without DB restart?

Can it be related to some non-vacuumed system table or so?

Any other ideas?

Thanks!

--- Original message ---
From: "PG Bug reporting form" <noreply@postgresql.org>
Date: 13 April 2021, 18:50:26

The following bug has been logged on the website:

Bug reference:      16961

Logged by:          Sergey Zhuravlev

Email address:      sergii.zhuravlev@smartnet.ua

PostgreSQL version: 13.2

Operating system:   CentOS Linux release 7.9.2009 (Core)

Description:

Hello

Command  - LISTEN missed_trades_empty_instrument

ERROR:  could not access status of transaction 1954017648

DETAIL:  Could not open file "pg_xact/0747": No such file or directory.

STATEMENT:  LISTEN missed_trades_empty_instrument

Current transaction

# select txid_current();

 txid_current

--------------

6985716158

Parameter -  autovacuum_freeze_max_age = 200000000

Re: BUG #16961: Could not access status of transaction

From

Noah Misch

Date:

05 June 2021, 20:55:45

On Mon, Apr 19, 2021 at 06:49:15PM +0000, Stepan Yankevych wrote:
> The database was rebooted and the issue disappeared

Each postmaster restart clears the LISTEN/NOTIFY queue, so that fits.

> The issue happening each morning when application starts on the production DataBase during about a month.
> Always the same transaction id is mentioned in the error (1954017648)
> We tried to do UNLISTEN - no changes. the same issue.
> LISTEN works good for any other channels.
> 
> 
> Can it be related to some hanged transaction? 1954017648? (for example while some network interruption)

Something like that; see below.

> Is it possible to kill/clean it somehow without DB restart?

Not to my knowledge.

> Can it be related to some non-vacuumed system table or so?

Probably not.

> Command  - LISTEN missed_trades_empty_instrument
> 
> ERROR:  could not access status of transaction 1954017648
> DETAIL:  Could not open file "pg_xact/0747": No such file or directory.
> STATEMENT:  LISTEN missed_trades_empty_instrument

The LISTEN/NOTIFY queue stores a transaction id for each notification
(internally, each AsyncQueueEntry).  I can imagine the "could not access
status" happening if a sequence of events like this happened since the last
postmaster restart:

backend 1: LISTEN missed_trades_empty_instrument
backend 2: BEGIN; NOTIFY missed_trades_empty_instrument [TransactionId N]
backend 1: BEGIN
backend 2: COMMIT
backend 1: CREATE TEMP TABLE x (); [sets TransactionId N+K]
autovacuum: freezes tuples, deletes pg_xact data covering TransactionId N
backend 1: COMMIT

On the other hand, if that's what happened, your report that "LISTEN works
good for any other channels" surprises me.  Perhaps something completely
different happened on your system.

We could prevent the trouble if vac_truncate_clog() had access to the oldest
xid in the notification queue; it would set frozenXID to that value if
frozenXID would otherwise be older.

Re: BUG #16961: Could not access status of transaction

From

Tom Lane

Date:

05 June 2021, 21:25:39

Noah Misch <noah@leadboat.com> writes:
> On Mon, Apr 19, 2021 at 06:49:15PM +0000, Stepan Yankevych wrote:
>> The issue happening each morning when application starts on the production DataBase during about a month.
>> Always the same transaction id is mentioned in the error (1954017648)
>> We tried to do UNLISTEN - no changes. the same issue.
>> LISTEN works good for any other channels.

> ...
> On the other hand, if that's what happened, your report that "LISTEN works
> good for any other channels" surprises me.  Perhaps something completely
> different happened on your system.

I suspect the true state of affairs was more like "LISTEN works good in
any other databases".  The described symptoms are consistent with there
being a message in the NOTIFY queue that has a pruned-away xid.  An
incoming listener would try to scan over already-committed notify
messages, but testing to see whether this stale message is committed
would fail.  However, since asyncQueueProcessPageEntries just ignores
messages not targeted for the current database, incoming listeners
in other databases wouldn't notice the problem.

ISTM the interesting question here is what was holding back truncation
of the NOTIFY queue.  Could there have been an open transaction somewhere
that was failing to collect NOTIFY data?  But a transaction sitting open
for a month is likely to cause far more severe problems than that one.

> We could prevent the trouble if vac_truncate_clog() had access to the oldest
> xid in the notification queue; it would set frozenXID to that value if
> frozenXID would otherwise be older.

Perhaps.  I'm not sure how hard it is to extract the oldest xid in the
queue (where "oldest" is defined as "numerically smallest").  The entries
are in xid commit order which is a different thing.

            regards, tom lane

Re: BUG #16961: Could not access status of transaction

From

Noah Misch

Date:

05 June 2021, 21:31:28

On Sat, Jun 05, 2021 at 05:25:39PM -0400, Tom Lane wrote:
> Noah Misch <noah@leadboat.com> writes:
> > On Mon, Apr 19, 2021 at 06:49:15PM +0000, Stepan Yankevych wrote:
> >> The issue happening each morning when application starts on the production DataBase during about a month.
> >> Always the same transaction id is mentioned in the error (1954017648)
> >> We tried to do UNLISTEN - no changes. the same issue.
> >> LISTEN works good for any other channels.
> 
> > ...
> > On the other hand, if that's what happened, your report that "LISTEN works
> > good for any other channels" surprises me.  Perhaps something completely
> > different happened on your system.
> 
> I suspect the true state of affairs was more like "LISTEN works good in
> any other databases".  The described symptoms are consistent with there
> being a message in the NOTIFY queue that has a pruned-away xid.  An
> incoming listener would try to scan over already-committed notify
> messages, but testing to see whether this stale message is committed
> would fail.  However, since asyncQueueProcessPageEntries just ignores
> messages not targeted for the current database, incoming listeners
> in other databases wouldn't notice the problem.
> 
> ISTM the interesting question here is what was holding back truncation
> of the NOTIFY queue.  Could there have been an open transaction somewhere
> that was failing to collect NOTIFY data?

Open transactions always decline to collect notify data, don't they?  See
ProcessNotifyInterrupt().

> But a transaction sitting open
> for a month is likely to cause far more severe problems than that one.

True.

> > We could prevent the trouble if vac_truncate_clog() had access to the oldest
> > xid in the notification queue; it would set frozenXID to that value if
> > frozenXID would otherwise be older.
> 
> Perhaps.  I'm not sure how hard it is to extract the oldest xid in the
> queue (where "oldest" is defined as "numerically smallest").  The entries
> are in xid commit order which is a different thing.

Yeah, it wouldn't be cheap in the general case.  The value could be a field in
pg_control, updated by a separate VACUUM NOTIFY, which autovacuum would also
run roughly as often as autovacuum visits template0.