Thread: Assertion with aborted UPDATE in subtransaction

Assertion with aborted UPDATE in subtransaction

From
Jasper Smit
Date:
Hi,

My colleague Oleksii Kozlov ran into an assertion while testing aborted UPDATE-commands in sub transactions.
To reproduce this assertion run the SQl in the attached script. I tested this on 15.10 and 17.4

Running the script will lead to the the assertion:
TRAP: failed Assert("HEAP_XMAX_IS_LOCKED_ONLY(infomask_lock_old_tuple)"), File: "/usr/local/postgresql-17.4/debug-build/../src/backend/access/heap/heapam.c", Line: 3766, PID: 15604

After analysis with Luc Vlaming, we believe that the problem is caused by a stale multixact member of an aborted subtransaction.

At the time of the assertion, we established that the new tuple does not fit on the same page as the old tuple. The
tuple lock needs to be updated while the page lock is temporarily released.

One line above the assertion, compute_new_xmax_infomask() is called, which will in turn call MultiXactIdExpand().
In MultiXactIdExpand() we determine that the requested txid/status is already a member of the current multixact, therefore skipping
the removal of dead members further below in that function. The multixact has in fact an aborted transaction included in it.
Because the aborted transaction was not removed, later in GetMultiXactIdHintBits(), HEAP_XMAX_LOCK_ONLY is not added to the infomask.
The absence of this bit in the infomask, will eventually lead to the assertion.

A possible fix is to change MultiXactIdExpand() to not skip the removal of dead members. See the proposed patch attached to this email.
Another alternative is to remove the assertion, as I think that at relevant places the transaction statuses of multixact members get checked.

Regards,
Jasper Smit

Attachment

Re: Assertion with aborted UPDATE in subtransaction

From
Jasper Smit
Date:
Hi,

Is this assertion something that is worthwhile to fix?

Thanks,
Jasper Smit

On Wed, Mar 26, 2025 at 4:26 PM Jasper Smit <jbsmit@gmail.com> wrote:
Hi,

My colleague Oleksii Kozlov ran into an assertion while testing aborted UPDATE-commands in sub transactions.
To reproduce this assertion run the SQl in the attached script. I tested this on 15.10 and 17.4

Running the script will lead to the the assertion:
TRAP: failed Assert("HEAP_XMAX_IS_LOCKED_ONLY(infomask_lock_old_tuple)"), File: "/usr/local/postgresql-17.4/debug-build/../src/backend/access/heap/heapam.c", Line: 3766, PID: 15604

After analysis with Luc Vlaming, we believe that the problem is caused by a stale multixact member of an aborted subtransaction.

At the time of the assertion, we established that the new tuple does not fit on the same page as the old tuple. The
tuple lock needs to be updated while the page lock is temporarily released.

One line above the assertion, compute_new_xmax_infomask() is called, which will in turn call MultiXactIdExpand().
In MultiXactIdExpand() we determine that the requested txid/status is already a member of the current multixact, therefore skipping
the removal of dead members further below in that function. The multixact has in fact an aborted transaction included in it.
Because the aborted transaction was not removed, later in GetMultiXactIdHintBits(), HEAP_XMAX_LOCK_ONLY is not added to the infomask.
The absence of this bit in the infomask, will eventually lead to the assertion.

A possible fix is to change MultiXactIdExpand() to not skip the removal of dead members. See the proposed patch attached to this email.
Another alternative is to remove the assertion, as I think that at relevant places the transaction statuses of multixact members get checked.

Regards,
Jasper Smit