Re: Assertion with aborted UPDATE in subtransaction - Mailing list pgsql-hackers

From Luc Vlaming Hummel
Subject Re: Assertion with aborted UPDATE in subtransaction
Date
Msg-id DS0PR08MB105167EBA5FBBDF47A9FE14B49CB42@DS0PR08MB10516.namprd08.prod.outlook.com
Whole thread Raw
In response to Re: Assertion with aborted UPDATE in subtransaction  (Jasper Smit <jbsmit@gmail.com>)
List pgsql-hackers
The considerations we ran in to are is that now you will create more calls to TransactionIdIsInProgress.
However, we think it would then also create a correct multixact with the right members only, and thereby avoid the assertion.

The one thing we have not been able to figure out yet, is what other fallout there would be if we dont fix this. Apart from the assert triggering we have not really observed any problem.

Is there someone that can shed some more light on the tradeoffs here, and whether or not this is the right fix or there should be something else fixed instead?

Thanks,
Luc

From: Jasper Smit <jbsmit@gmail.com>
Date: Monday, 31. March 2025 at 12:02
To: pgsql-hackers@lists.postgresql.org <pgsql-hackers@lists.postgresql.org>
Subject: Re: Assertion with aborted UPDATE in subtransaction

[External Email]



Hi,

Is this assertion something that is worthwhile to fix?

Thanks,
Jasper Smit

On Wed, Mar 26, 2025 at 4:26 PM Jasper Smit <jbsmit@gmail.com> wrote:
Hi,

My colleague Oleksii Kozlov ran into an assertion while testing aborted UPDATE-commands in sub transactions.
To reproduce this assertion run the SQl in the attached script. I tested this on 15.10 and 17.4

Running the script will lead to the the assertion:
TRAP: failed Assert("HEAP_XMAX_IS_LOCKED_ONLY(infomask_lock_old_tuple)"), File: "/usr/local/postgresql-17.4/debug-build/../src/backend/access/heap/heapam.c", Line: 3766, PID: 15604

After analysis with Luc Vlaming, we believe that the problem is caused by a stale multixact member of an aborted subtransaction.

At the time of the assertion, we established that the new tuple does not fit on the same page as the old tuple. The
tuple lock needs to be updated while the page lock is temporarily released.

One line above the assertion, compute_new_xmax_infomask() is called, which will in turn call MultiXactIdExpand().
In MultiXactIdExpand() we determine that the requested txid/status is already a member of the current multixact, therefore skipping
the removal of dead members further below in that function. The multixact has in fact an aborted transaction included in it.
Because the aborted transaction was not removed, later in GetMultiXactIdHintBits(), HEAP_XMAX_LOCK_ONLY is not added to the infomask.
The absence of this bit in the infomask, will eventually lead to the assertion.

A possible fix is to change MultiXactIdExpand() to not skip the removal of dead members. See the proposed patch attached to this email.
Another alternative is to remove the assertion, as I think that at relevant places the transaction statuses of multixact members get checked.

Regards,
Jasper Smit

pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: Streaming relation data out of order
Next
From: Andrei Lepikhov
Date:
Subject: Re: Some problems regarding the self-join elimination code