On Sun, 24 Aug 2025 at 18:34, Yugo Nagata <nagata@sraoss.co.jp> wrote:
>
> I confirmed this issue by executing the following query concurrently
> in three transactions. (With only two transactions, the issue does not occur.)
Yes, I think 3 transactions are required to reproduce this (2 separate
concurrent updates).
> I don't completely understand how this race condition occurs,
> but I believe the bug is due to the misuse of TM_FailureData
> returned by table_tuple_lock in ExecMergeMatched().
>
> Currently, TM_FailureData.ctid is used as a reference to the
> latest version of oldtuple, but this is not always correct.
> Instead, the tupleid passed to table_tuple_lock should be used.
>
> I've attached a patch to fix this.
Thanks. That makes sense.
I think we also should update the isolation tests to test this.
Attached is an update to the merge-match-recheck isolation test, doing
so. As you found, it doesn't always seem to fail with the unpatched
code (though I didn't look to see why), but with your patch, it always
passes.
Regards,
Dean