On Thu, Dec 28, 2023 at 4:02 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
> My main question is why an IO error would cause the DB to abort, rather
> than raising an ERROR.
In CommitTransaction() there is a stretch of code beginning s->state =
TRANS_COMMIT and ending s->state = TRANS_DEFAULT, from which we call
out to various subsystems' AtEOXact_XXX() functions. There is no way
to roll back in that state, so anything that throws ERROR from those
routines is going to get something much like $SUBJECT. Hmm, we'd know
which exact code path got that EIO from your smoldering core if we'd
put an explicit critical section there (if we're going to PANIC
anyway, it might as well not be from a different stack after
longjmp()...).
I guess the large object usage isn't directly relevant (that module's
EOXact stuff seems to be finished before TRANS_COMMIT, but I don't
know that code well). Everything later is supposed to be about
closing/releasing/cleaning up, and for example smgrDoPendingDeletes()
reaches code with this relevant comment:
* Note: smgr_unlink must treat deletion failure as a WARNING, not an
* ERROR, because we've already decided to commit or abort the current
* xact.
We don't really have a general ban on ereporting on system call
failure, though. We've just singled unlink() out. Only a few lines
above that we call DropRelationsAllBuffers(rels, nrels), and that
calls smgrnblocks(), and that might need to need to re-open() the
relation file to do lseek(SEEK_END), because PostgreSQL itself has no
tracking of relation size. Hard to say but my best guess is that's
where you might have got your EIO, assuming you dropped the relation
in this transaction?
> This is pg16 compiled at efa8f6064, runing under centos7. ZFS is 2.2.2,
> but the pool hasn't been upgraded to use the features new since 2.1.
I've been following recent ZFS stuff from a safe distance as a user.
AFAIK the extremely hard to hit bug fixed in that very recent release
didn't technically require the interesting new feature (namely block
cloning, though I think that helped people find the root cause after a
phase of false blame?). Anyway, it had for symptom some bogus zero
bytes on read, not a spurious EIO.