broken reading on standby (PostgreSQL 16.2) - Mailing list pgsql-hackers

From Pavel Stehule
Subject broken reading on standby (PostgreSQL 16.2)
Date
Msg-id CAFj8pRBEFMxxFSCVOSi-4n0jHzSaxh6Ze_cZid5eG=tsnn49-A@mail.gmail.com
Whole thread Raw
Responses Re: broken reading on standby (PostgreSQL 16.2)
Re: broken reading on standby (PostgreSQL 16.2)
List pgsql-hackers
Hi

yesterday, I had to fix strange issue on standby server

The query to freshly updated data fails

select * from seller_success_rate where create_time::date = '2024-04-23';
ERROR:  58P01: could not access status of transaction 1393466389
DETAIL:  Could not open file "pg_xact/0530": No such file or directory.
LOCATION:  SlruReportIOError, slru.c:947

amcheck

select * from verify_heapam('seller_success_rate');
 blkno | offnum | attnum |                                msg                                
-------+--------+--------+-------------------------------------------------------------------
  5763 |    111 |        | xmin 1393466389 precedes oldest valid transaction ID 3:1523885078
  5863 |    109 |        | xmin 1393466389 precedes oldest valid transaction ID 3:1523885078
  5863 |    110 |        | xmin 1393466389 precedes oldest valid transaction ID 3:1523885078
  5868 |    110 |        | xmin 1393466389 precedes oldest valid transaction ID 3:1523885078
  5868 |    111 |        | xmin 1393466389 precedes oldest valid transaction ID 3:1523885078
  5875 |    111 |        | xmin 1393466389 precedes oldest valid transaction ID 3:1523885078
  5895 |    109 |        | xmin 1393466389 precedes oldest valid transaction ID 3:1523885078
  5895 |    110 |        | xmin 1439564642 precedes oldest valid transaction ID 3:1523885078
  6245 |    108 |        | xmin 1393466389 precedes oldest valid transaction ID 3:1523885078
  6245 |    109 |        | xmin 1393466389 precedes oldest valid transaction ID 3:1523885078
  6245 |    110 |        | xmin 1439564642 precedes oldest valid transaction ID 3:1523885078
  6245 |    112 |        | xmin 1424677216 precedes oldest valid transaction ID 3:1523885078
  6378 |    109 |        | xmin 1393466389 precedes oldest valid transaction ID 3:1523885078
  6378 |    110 |        | xmin 1393466389 precedes oldest valid transaction ID 3:1523885078
  6382 |    110 |        | xmin 1393466389 precedes oldest valid transaction ID 3:1523885078
  6590 |    110 |        | xmin 1393466389 precedes oldest valid transaction ID 3:1523885078
  6590 |    111 |        | xmin 1393466389 precedes oldest valid transaction ID 3:1523885078
  7578 |    112 |        | xmin 1393466389 precedes oldest valid transaction ID 3:1523885078
  7581 |    112 |        | xmin 1393466389 precedes oldest valid transaction ID 3:1523885078
  8390 |    112 |        | xmin 1393466389 precedes oldest valid transaction ID 3:1523885078
 10598 |    109 |        | xmin 1393466389 precedes oldest valid transaction ID 3:1523885078
 10598 |    110 |        | xmin 1393466389 precedes oldest valid transaction ID 3:1523885078

I verified xmin against the primary server, and it was the same. There was not any replication gap.

I checked the fields from pg_database table, and looks same too

These rows were valid (and visible) on primary.

On this server there was not any long session (when I was connected), unfortunately I cannot test restart of this server.  One wal sender is executing on standby. Fortunately, there was a possibility to run VACUUM FULL, and it fixed the issue.

The customer has archived wals.

My question - is it possible to do some diagnostics from SQL level? I didn't find a way to get values that are used for comparison by amcheck from SQL.

Regards

Pavel

pgsql-hackers by date:

Previous
From: Alexander Lakhin
Date:
Subject: Re: Avoid orphaned objects dependencies, take 3
Next
From: "Andrey M. Borodin"
Date:
Subject: Re: broken reading on standby (PostgreSQL 16.2)