On Thu, Dec 28, 2023 at 11:33:16AM +1300, Thomas Munro wrote:
> I guess the large object usage isn't directly relevant (that module's
> EOXact stuff seems to be finished before TRANS_COMMIT, but I don't
> know that code well). Everything later is supposed to be about
> closing/releasing/cleaning up, and for example smgrDoPendingDeletes()
> reaches code with this relevant comment:
>
> * Note: smgr_unlink must treat deletion failure as a WARNING, not an
> * ERROR, because we've already decided to commit or abort the current
> * xact.
>
> We don't really have a general ban on ereporting on system call
> failure, though. We've just singled unlink() out. Only a few lines
> above that we call DropRelationsAllBuffers(rels, nrels), and that
> calls smgrnblocks(), and that might need to need to re-open() the
> relation file to do lseek(SEEK_END), because PostgreSQL itself has no
> tracking of relation size. Hard to say but my best guess is that's
> where you might have got your EIO, assuming you dropped the relation
> in this transaction?
Yeah. In fact I was confused - this was not lo_unlink().
This uses normal tables, so would've done:
"begin;"
"DROP TABLE IF EXISTS %s", tablename
"DELETE FROM cached_objects WHERE cache_name=%s", tablename
"commit;"
> > This is pg16 compiled at efa8f6064, runing under centos7. ZFS is 2.2.2,
> > but the pool hasn't been upgraded to use the features new since 2.1.
>
> I've been following recent ZFS stuff from a safe distance as a user.
> AFAIK the extremely hard to hit bug fixed in that very recent release
> didn't technically require the interesting new feature (namely block
> cloning, though I think that helped people find the root cause after a
> phase of false blame?). Anyway, it had for symptom some bogus zero
> bytes on read, not a spurious EIO.
The ZFS bug had to do with bogus bytes which may-or-may-not-be-zero, as
I understand. The understanding is that the bug was pre-existing but
became more easy to hit in 2.2, and is fixed in 2.2.2 and 2.1.14.
--
Justin