On Fri, May 5, 2023 at 6:11 AM Evgeny Morozov
<postgresql3@realityexists.net> wrote:
> Meanwhile, what do I do with the existing server, though? Just try to
> drop the problematic DBs again manually?
That earlier link to a FreeBSD thread is surely about bleeding edge
new ZFS stuff that was briefly broken then fixed, being discovered by
people running code imported from OpenZFS master branch into FreeBSD
main branch (ie it's not exactly released, not following the details
but I think it might soon be 2.2?), but you're talking about an LTS
Ubuntu release from 2018, which shipped "ZFS on Linux" version 0.7.5,
unless you installed a newer version somehow? So it doesn't sound
like it could be related.
That doesn't mean it couldn't be a different ZFS bug though. While
looking into file system corruption issues that had similar symptoms
on some other file system (which turned out to be a bug in btrfs) I
did bump into a claim that ZFS could product unexpected zeroes in some
mmap coherency scenario, OpenZFS issue #14548. I don't immediately
see how PostgreSQL could get tangled up with that problem though, as
we aren't doing that...
It seems quite interesting that it's always pg_class_oid_index block 0
(the btree meta-page), which feels more like a PostgreSQL bug, unless
the access pattern of that particular file/block is somehow highly
unusual compared to every other block and tickling bugs elsewhere in
the stack. How does that file look, in terms of size, and how many
pages in it are zero? I think it should be called base/5/2662.
Oooh, but this is a relation that goes through
RelationMapOidToFilenumber. What does select
pg_relation_filepath('pg_class_oid_index') show in the corrupted
database, base/5/2662 or something else? Now *that* is a piece of
logic that changed in PostgreSQL 15. It changed from sector-based
atomicity assumptions to a directory entry swizzling trick, in commit
d8cd0c6c95c0120168df93aae095df4e0682a08a. Hmm.