Re: cannot abort transaction 2737414167, it was already committed - Mailing list pgsql-hackers

From Justin Pryzby
Subject Re: cannot abort transaction 2737414167, it was already committed
Date
Msg-id ZYyrZg-Lzoy9w3Fp@pryzbyj2023
Whole thread Raw
In response to Re: cannot abort transaction 2737414167, it was already committed  (Thomas Munro <thomas.munro@gmail.com>)
List pgsql-hackers
On Thu, Dec 28, 2023 at 11:33:16AM +1300, Thomas Munro wrote:
> I guess the large object usage isn't directly relevant (that module's
> EOXact stuff seems to be finished before TRANS_COMMIT, but I don't
> know that code well).  Everything later is supposed to be about
> closing/releasing/cleaning up, and for example smgrDoPendingDeletes()
> reaches code with this relevant comment:
> 
>      * Note: smgr_unlink must treat deletion failure as a WARNING, not an
>      * ERROR, because we've already decided to commit or abort the current
>      * xact.
> 
> We don't really have a general ban on ereporting on system call
> failure, though.  We've just singled unlink() out.  Only a few lines
> above that we call DropRelationsAllBuffers(rels, nrels), and that
> calls smgrnblocks(), and that might need to need to re-open() the
> relation file to do lseek(SEEK_END), because PostgreSQL itself has no
> tracking of relation size.  Hard to say but my best guess is that's
> where you might have got your EIO, assuming you dropped the relation
> in this transaction?

Yeah.  In fact I was confused - this was not lo_unlink().
This uses normal tables, so would've done:

"begin;"
"DROP TABLE IF EXISTS %s", tablename
"DELETE FROM cached_objects WHERE cache_name=%s", tablename
"commit;"

> > This is pg16 compiled at efa8f6064, runing under centos7.  ZFS is 2.2.2,
> > but the pool hasn't been upgraded to use the features new since 2.1.
> 
> I've been following recent ZFS stuff from a safe distance as a user.
> AFAIK the extremely hard to hit bug fixed in that very recent release
> didn't technically require the interesting new feature (namely block
> cloning, though I think that helped people find the root cause after a
> phase of false blame?).  Anyway, it had for symptom some bogus zero
> bytes on read, not a spurious EIO.

The ZFS bug had to do with bogus bytes which may-or-may-not-be-zero, as
I understand.  The understanding is that the bug was pre-existing but
became more easy to hit in 2.2, and is fixed in 2.2.2 and 2.1.14.

-- 
Justin



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: cannot abort transaction 2737414167, it was already committed
Next
From: Jeff Davis
Date:
Subject: Re: Built-in CTYPE provider