Re: broken tables on hot standby after migration on PostgreSQL 16 (3x times last month) - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: broken tables on hot standby after migration on PostgreSQL 16 (3x times last month)
Date
Msg-id CAH2-WznVTBVQW0wPrvUxDxMs1mLZS1L0HaBZG-xEmsA+qw_ABw@mail.gmail.com
Whole thread Raw
In response to broken tables on hot standby after migration on PostgreSQL 16 (3x times last month)  (Pavel Stehule <pavel.stehule@gmail.com>)
Responses Re: broken tables on hot standby after migration on PostgreSQL 16 (3x times last month)
List pgsql-hackers
On Fri, May 17, 2024 at 9:13 AM Pavel Stehule <pavel.stehule@gmail.com> wrote:
> after migration on PostgreSQL 16 I seen 3x times (about every week) broken tables on replica nodes. The query fails
witherror 
>
> ERROR:  could not access status of transaction 1442871302
> DETAIL:  Could not open file "pg_xact/0560": No such file or directory

You've shown an inconsistency between the primary and standby with
respect to the heap tuple infomask bits related to freezing. It looks
like a FREEZE WAL record from the primary was never replayed on the
standby.

It's natural for me to wonder if my Postgres 16 work on page-level
freezing might be a factor here. If that really was true, then it
would be necessary to explain why the primary and standby are
inconsistent (no reason to suspect a problem on the primary here).
It'd have to be the kind of issue that could be detected mechanically
using wal_consistency_checking, but wasn't detected that way before
now -- that seems unlikely.

It's worth considering if the more aggressive behavior around
relfrozenxid advancement (in 15) and freezing (in 16) has increased
the likelihood of problems like these in setups that were already
faulty, in whatever way. The standby database is indeed corrupt, but
even on 16 it's fairly isolated corruption in practical terms. The
full extent of the problem is clear once amcheck is run, but only one
tuple can actually cause the system to error due to the influence of
hint bits (for better or worse, hint bits mask the problem quite well,
even on 16).

--
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: commitfest.postgresql.org is no longer fit for purpose
Next
From: Joe Conway
Date:
Subject: Re: commitfest.postgresql.org is no longer fit for purpose