On Sun, Oct 24, 2021 at 4:40 AM PG Bug reporting form
<noreply@postgresql.org> wrote:
> PostgreSQL version: 14.0
> Operating system: FreeBSD 13.0-RELEASE
PostgreSQL 14 is the first version where the default wal_sync_method
is fdatasync() on FreeBSD -- though only with FreeBSD 13. See commit
f900a79e. Perhaps that has something to do with the problem seen here.
I have no concrete reason to suspect fdatasync() here -- I'm just
pattern matching. But the inconsistency that Andrew reported looks
like any other scenario where the system's wal_sync_method doesn't
provide the expected guarantees. This looks like generic corruption to
me. Andrew said that the "real entries for 'Ying_Swei' in the index
are not in the same page or even any nearby page". It is reasonable to
suppose that the TID in the index pointing to the wrong tuple in the
heap/table is from an index tuple that has no business being anywhere
near the leaf page that we find it in -- I surmise that it's too far
away in the keyspace for that. (I think that this was Andrew's exact
point, in fact, but I just want to make sure.)
I suspect that there was TID recycling by VACUUM involved here.
Originally, the leaf page image that Andrew examined might well have
actually pointed to the right thing in the heap. Perhaps VACUUM ran,
and recycled the relevant heap TID, while somehow leaving an older
pre-VACUUM leaf page behind (any kind of storage misbehavior could do
this).
There doesn't have to have been a VACUUM, actually. A simple hard
crash (with a misbehaving wal_sync_method or whatever) could easily
lead to the same symptoms. You just need to have heapam "allocate new
heap TIDs" and then forget about having done so. Without the index
structure also forgetting.
--
Peter Geoghegan