BUG #18804: LISTEN on channel fails with "could not access status of transaction" - Mailing list pgsql-bugs

From PG Bug reporting form
Subject BUG #18804: LISTEN on channel fails with "could not access status of transaction"
Date
Msg-id 18804-bccbbde5e77a68c2@postgresql.org
Whole thread Raw
List pgsql-bugs
The following bug has been logged on the website:

Bug reference:      18804
Logged by:          Andrei Varashen
Email address:      voroshen.av@gmail.com
PostgreSQL version: 16.3
Operating system:   Debian GNU/Linux 12 (bookworm)
Description:

Hi there, 

I've encountered the error while trying to register a listener with the
LISTEN channel_name statement in my production database:

STATEMENT: LISTEN river_leadership
ERROR: could not access status of transaction 2048841894
DETAIL: Could not open file "pg_xact/07A1": No such file or directory.

That also led to an error when attempting to register a listener for any
other channel in that database.

After some investigation, I found that similar issue has been reported
several times already, but none of the investigations cocluded with a
solution except for recommendation to restart postmaster in order to clean
up the LISTEN/NOTIFY queue (as far as I understood). This workaround works,
but the root cause of this behaviour remains unaddressed. Similar bug
reports:

1.
https://www.postgresql.org/message-id/16961-25f29f95b3604a8a%40postgresql.org
2.
https://www.postgresql.org/message-id/18394-e7459245148578b2@postgresql.org

I managed to reproduce the issue locally. 

Pre-conditions: 

1. postgres is deployed using docker. image:

https://hub.docker.com/layers/library/postgres/16.3/images/sha256-1b277d0af2273577d8547b295e3834baaf0c04f5b3b823882f686bac23502cf7

2. autovacuum is disabled (to avoid any automatic intervention), all other
properties use default values:

> cat postgres-test.conf | grep 'autovacuum = off'
autovacuum = off

3. test table is created in postgres database (used later to generate a high
number of update transactions):

postgres=# create table test(id int);
CREATE TABLE
postgres=# insert into test values (1);
INSERT 0 1


Steps to reproduce:

1. listen to the channel (backend 1): 

postgres=# LISTEN test_chan;
LISTEN

2. notify the channel (backend 2): 

postgres=# NOTIFY test_chan;
NOTIFY
postgres=# select txid_current();
 txid_current
--------------
          734
(1 row)

3. receive notification (backend 1):

postgres=# LISTEN test_chan;
LISTEN
Asynchronous notification "test_chan" received from server process with PID
86.

4. execute a simple update command on test table (from the pre-conditions
section) using pgbench in order to generate enough transactions to create
pg_xact/0001 file:

> pgbench --version

                                                     
pgbench (PostgreSQL) 16.3 (Debian 16.3-1.pgdg120+1)
> cat test.sql
UPDATE test SET id = 1;
> ls -lah /var/lib/postgresql/data/pg_xact
total 16K
drwx------  2 postgres postgres 4.0K Feb 11 12:45 .
drwx------ 19 postgres postgres 4.0K Feb 11 12:45 ..
-rw-------  1 postgres postgres 8.0K Feb 11 12:45 0000
> export PGPASSWORD=postgres; pgbench -h localhost -p 5432 -U postgres -c 80
-j 10 -t 15000 -f test.sql postgres

(pgbench generates 1_200_000 transactions in total, an empirically found
number that completely fills pg_xact/0000 and leads to pg_xact/0001
creation)

5. flush changes from shared buffered to disk (backend 3):

postgres=# CHECKPOINT;
CHECKPOINT

6. verify that pg_xact/0001 is created:

> ls -lah /var/lib/postgresql/data/pg_xact
total 304K
drwx------  2 postgres postgres 4.0K Feb 11 13:00 .
drwx------ 19 postgres postgres 4.0K Feb 11 12:45 ..
-rw-------  1 postgres postgres 256K Feb 11 13:00 0000
-rw-------  1 postgres postgres  40K Feb 11 13:00 0001

7. execute VACUUM FREEZE on each database at the server to freeze rows and
purge pg_xact/0000 (backend 4):

postgres=# VACUUM FREEZE;
VACUUM

postgres=# \c template1
You are now connected to database "template1" as user "postgres".
template1=# VACUUM FREEZE;
VACUUM

template1=# ALTER DATABASE template0 WITH ALLOW_CONNECTIONS true;
ALTER DATABASE
template1=# \c template0
You are now connected to database "template0" as user "postgres".
template0=# VACUUM FREEZE;
VACUUM

template0=# select datname, datfrozenxid from pg_database;
  datname  | datfrozenxid
-----------+--------------
 postgres  |      1200736
 template0 |      1200737
 template1 |      1200736
(3 rows)

postgres=# \c template1
You are now connected to database "template1" as user "postgres".
template1=# VACUUM FREEZE;
VACUUM

8. ensure that pg_xact/0000 is gone

> ls -lah /var/lib/postgresql/data/pg_xact
total 48K
drwx------  2 postgres postgres 4.0K Feb 11 15:23 .
drwx------ 19 postgres postgres 4.0K Feb 11 12:45 ..
-rw-------  1 postgres postgres  40K Feb 11 13:00 0001

9. try to listen to test_chan (backend 5):

postgres=# LISTEN test_chan;
ERROR:  could not access status of transaction 733
DETAIL:  Could not open file "pg_xact/0000": No such file or directory.

10. ensure that it is impossible to listen to any otherchannel (backend
6):

postgres=# LISTEN another_test_chan;
ERROR:  could not access status of transaction 733
DETAIL:  Could not open file "pg_xact/0000": No such file or directory

Expected behaviour: 

I should be able to listen to any channel in the database after Step 7

NOTE:

I have tried the same steps on the latest minor release for my major version
(16.6, to be precise) and the issue still persists.


pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: BUG #18803: ERROR: wrong varnullingrels (b) (expected (b 4)) for Var 2/1
Next
From: Marko Tiikkaja
Date:
Subject: Re: BUG #18803: ERROR: wrong varnullingrels (b) (expected (b 4)) for Var 2/1