pá 17. 5. 2024 v 21:50 odesílatel Andres Freund <andres@anarazel.de> napsal:
Hi,
On 2024-05-17 15:12:31 +0200, Pavel Stehule wrote: > after migration on PostgreSQL 16 I seen 3x times (about every week) broken > tables on replica nodes. The query fails with error
Migrating from what version?
I think 14, but it should be verified tomorrow
upgrade was from 15.2
You're saying that the data is correctly accessible on primaries, but broken on standbys? Is there any difference in how the page looks like on the primary vs standby?
I saved one page from master and standby. Can I send it privately? There are some private data (although not too sensitive)
> ERROR: could not access status of transaction 1442871302 > DETAIL: Could not open file "pg_xact/0560": No such file or directory > > verify_heapam reports > > ^[[Aprd=# select * from verify_heapam('account_login_history') where blkno > = 179036; > blkno | offnum | attnum | msg > > --------+--------+--------+------------------------------------------------------------------- > 179036 | 30 | | xmin 1393743382 precedes oldest valid > transaction ID 3:1687012112
Is the replica behind the primary? Or did we somehow end up with diverging data? The page LSNs differ by about 40GB...
Is there evidence of failed truncations of the relation in the log? From autovacuum?
no I did not see these bugs,
Does the data in the readable versions of the tuples on that page actually look valid? Is it possibly duplicated data?
looks well, I didn't see any strange content
I'm basically wondering whether it's possible that we errored out during truncation (e.g. due to a file permission issue or such). Due to some brokenness in RelationTruncate() that can lead to data divergence between primary and standby and to old tuples re-appearing on either.