Thread: BUG #17809: MERGE ... UPDATE fails with BEFORE ROW UPDATE trigger when target row updated concurrently

The following bug has been logged on the website:

Bug reference:      17809
Logged by:          Alexander Lakhin
Email address:      exclusion@gmail.com
PostgreSQL version: 15.2
Operating system:   Ubuntu 22.04
Description:

When executing the following script:

cat << "EOF" | psql
CREATE TABLE target (tid integer, val integer);
INSERT INTO target VALUES (1, 0);
CREATE OR REPLACE FUNCTION brut_func() RETURNS trigger LANGUAGE plpgsql AS
'BEGIN RETURN NEW; END;';
CREATE TRIGGER brut BEFORE UPDATE ON target FOR EACH ROW EXECUTE PROCEDURE
brut_func();
EOF

for ((i=1;i<=10;i++)); do
  echo "iteration $i"
  psql -c "UPDATE TARGET SET val = 0" &
  psql -c "MERGE INTO target t1 USING target t2 ON t1.tid = t2.tid WHEN
MATCHED THEN UPDATE SET val = 0" &
  wait
  ls tmpdb/core* && break
  coredumpctl --no-pager && break
done

I get a server crash with the following stack trace:
Program terminated with signal SIGSEGV, Segmentation fault.

warning: Section `.reg-xstate/1808258' in core file too small.
#0  0x000055e66490440e in internalGetUpdateNewTuple (relinfo=0x55e66595e040,
planSlot=0x55e665998288, 
    oldSlot=0x55e665969978, relaction=0x0) at nodeModifyTable.c:741
741             econtext = newProj->pi_exprContext;
(gdb) bt
#0  0x000055e66490440e in internalGetUpdateNewTuple (relinfo=0x55e66595e040,
planSlot=0x55e665998288, 
    oldSlot=0x55e665969978, relaction=0x0) at nodeModifyTable.c:741
#1  0x000055e6649043e0 in ExecGetUpdateNewTuple (relinfo=0x55e66595e040,
planSlot=0x55e665998288, 
    oldSlot=0x55e665969978) at nodeModifyTable.c:726
#2  0x000055e66487e835 in ExecBRUpdateTriggers (estate=0x55e66595dba0,
epqstate=0x55e66595df10, 
    relinfo=0x55e66595e040, tupleid=0x7ffec9335002, fdw_trigtuple=0x0,
newslot=0x55e665969748, tmfd=0x7ffec93350b0)
    at trigger.c:3037
#3  0x000055e66490622e in ExecUpdatePrologue (context=0x7ffec9335080,
resultRelInfo=0x55e66595e040, 
    tupleid=0x7ffec9335002, oldtuple=0x0, slot=0x55e665969748) at
nodeModifyTable.c:1912
#4  0x000055e6649078b2 in ExecMergeMatched (context=0x7ffec9335080,
resultRelInfo=0x55e66595e040, 
    tupleid=0x7ffec9335002, canSetTag=true) at nodeModifyTable.c:2884
#5  0x000055e6649075da in ExecMerge (context=0x7ffec9335080,
resultRelInfo=0x55e66595e040, tupleid=0x7ffec9335002, 
    canSetTag=true) at nodeModifyTable.c:2750
#6  0x000055e6649097f2 in ExecModifyTable (pstate=0x55e66595de28) at
nodeModifyTable.c:3866
#7  0x000055e6648c60d7 in ExecProcNodeFirst (node=0x55e66595de28) at
execProcnode.c:464
#8  0x000055e6648b9353 in ExecProcNode (node=0x55e66595de28) at
../../../src/include/executor/executor.h:259
#9  0x000055e6648bc25a in ExecutePlan (estate=0x55e66595dba0,
planstate=0x55e66595de28, use_parallel_mode=false, 
    operation=CMD_MERGE, sendTuples=false, numberTuples=0,
direction=ForwardScanDirection, dest=0x55e665955bf8, 
    execute_once=true) at execMain.c:1636
#10 0x000055e6648b9a8a in standard_ExecutorRun (queryDesc=0x55e66594c960,
direction=ForwardScanDirection, count=0, 
    execute_once=true) at execMain.c:363
#11 0x000055e6648b9875 in ExecutorRun (queryDesc=0x55e66594c960,
direction=ForwardScanDirection, count=0, 
    execute_once=true) at execMain.c:307
#12 0x000055e664b31952 in ProcessQuery (plan=0x55e665955b08, 
    sourceText=0x55e6658593b0 "MERGE INTO target t1 USING target t2 ON
t1.tid = t2.tid WHEN MATCHED THEN UPDATE SET val = 0", params=0x0,
queryEnv=0x0, dest=0x55e665955bf8, qc=0x7ffec93354e0) at pquery.c:160
#13 0x000055e664b33545 in PortalRunMulti (portal=0x55e6658d5190,
isTopLevel=true, setHoldSnapshot=false, 
    dest=0x55e665955bf8, altdest=0x55e665955bf8, qc=0x7ffec93354e0) at
pquery.c:1277
#14 0x000055e664b32a28 in PortalRun (portal=0x55e6658d5190,
count=9223372036854775807, isTopLevel=true, run_once=true, 
    dest=0x55e665955bf8, altdest=0x55e665955bf8, qc=0x7ffec93354e0) at
pquery.c:791
#15 0x000055e664b2b803 in exec_simple_query (
    query_string=0x55e6658593b0 "MERGE INTO target t1 USING target t2 ON
t1.tid = t2.tid WHEN MATCHED THEN UPDATE SET val = 0") at postgres.c:1250
#16 0x000055e664b30722 in PostgresMain (dbname=0x55e665885938 "regression",
username=0x55e665885918 "law")
    at postgres.c:4593
#17 0x000055e664a55c23 in BackendRun (port=0x55e66587d420) at
postmaster.c:4511
#18 0x000055e664a554aa in BackendStartup (port=0x55e66587d420) at
postmaster.c:4239
#19 0x000055e664a51437 in ServerLoop () at postmaster.c:1806
#20 0x000055e664a50b94 in PostmasterMain (argc=3, argv=0x55e665853610) at
postmaster.c:1478
#21 0x000055e664944ad5 in main (argc=3, argv=0x55e665853610) at main.c:202

At first sight it appeared like another manifestation of bug #17798,
but it seems that this is a distinct defect.

As I can see, ExecModifyTable() for the "case CMD_MERGE" can pass to
ExecMerge() the resultRelInfo structure with ri_projectNew == null
(but ri_projectNewInfoValid == true (set in ExecInitMergeTupleSlots)).
This is not an issue till GetTupleForTrigger() in ExecBRUpdateTriggers()
gets a tuple. But when a concurrent update occurs, GetTupleForTrigger()
fails and ExecGetUpdateNewTuple() gets called, where
internalGetUpdateNewTuple() tries to get
relinfo->ri_projectNew->pi_exprContext...

The issue appeared with the MERGE introduction (7103ebb7a).


On Mon, 27 Feb 2023 at 08:06, PG Bug reporting form
<noreply@postgresql.org> wrote:
>
> When executing the following script I get a server crash with the following stack trace:
> Program terminated with signal SIGSEGV, Segmentation fault.
>

Confirmed. I can reliably reproduce this using 2 psql sessions as follows:

-- Setup
DROP TABLE IF EXISTS target;
CREATE TABLE target (tid integer, val integer);
INSERT INTO target VALUES (1, 0);
CREATE OR REPLACE FUNCTION brut_func() RETURNS trigger LANGUAGE plpgsql AS
  $$ BEGIN RETURN NEW; END; $$;
CREATE TRIGGER brut BEFORE UPDATE ON target
  FOR EACH ROW EXECUTE PROCEDURE brut_func();

-- Session 1
BEGIN;
UPDATE TARGET SET val = 0;

-- Session 2
MERGE INTO target t1 USING target t2 ON t1.tid = t2.tid
  WHEN MATCHED THEN UPDATE SET val = 0;

-- Session 1
COMMIT;

What's happening is that (in session 2) the trigger code in
ExecBRUpdateTriggers() detects the concurrent update and calls
ExecGetUpdateNewTuple() to get a new tuple. However,
ExecGetUpdateNewTuple() just calls internalGetUpdateNewTuple(), when
really it ought to call mergeGetUpdateNewTuple(), since we're in a
MERGE. Unfortunately there's no way for ExecGetUpdateNewTuple() to
know that, because it doesn't have the required context information.

A quick hack that fixes the problem is to pass the current
MergeActionState to ExecGetUpdateNewTuple(), which also then requires
that it be passed to ExecBRUpdateTriggers(). Changing the signature of
2 public functions in that way in back branches doesn't seem very nice
though.

OTOH, it's difficult to see how it can be fixed any other way.

I don't really like this change to ExecGetUpdateNewTuple(), but if we
are going to change it like this, then I think we should go all the
way and get rid of the GetUpdateNewTuple callback, and the separate
internalGetUpdateNewTuple() and mergeGetUpdateNewTuple() functions,
and just do it all in ExecGetUpdateNewTuple(), so that all the code is
in one place, making it easier to follow.

Thoughts?

Regards,
Dean

Attachment
On 2023-Feb-27, Dean Rasheed wrote:

> What's happening is that (in session 2) the trigger code in
> ExecBRUpdateTriggers() detects the concurrent update and calls
> ExecGetUpdateNewTuple() to get a new tuple. However,
> ExecGetUpdateNewTuple() just calls internalGetUpdateNewTuple(), when
> really it ought to call mergeGetUpdateNewTuple(), since we're in a
> MERGE. Unfortunately there's no way for ExecGetUpdateNewTuple() to
> know that, because it doesn't have the required context information.

Hmm, right.

> A quick hack that fixes the problem is to pass the current
> MergeActionState to ExecGetUpdateNewTuple(), which also then requires
> that it be passed to ExecBRUpdateTriggers(). Changing the signature of
> 2 public functions in that way in back branches doesn't seem very nice
> though.
> 
> OTOH, it's difficult to see how it can be fixed any other way.
> 
> I don't really like this change to ExecGetUpdateNewTuple(), but if we
> are going to change it like this, then I think we should go all the
> way and get rid of the GetUpdateNewTuple callback, and the separate
> internalGetUpdateNewTuple() and mergeGetUpdateNewTuple() functions,
> and just do it all in ExecGetUpdateNewTuple(), so that all the code is
> in one place, making it easier to follow.

I'm pretty sure that I split things this way (using a callback) because
the MERGE code was at arms-length, being in a different file, and there
was no reasonable way, without duplicating a lot of other code, to
implement the different behavior other than using a callback.

I then moved the MERGE code from a separate file into nodeModifyTable.c
and could have put it back in one piece, but didn't remember that it was
there.  It even looks like doing that may even make the code simpler, so
perhaps we should just do that -- nodeModifyTable.c if fully aware of
MERGE now, so it's not a problem.

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Puedes vivir sólo una vez, pero si lo haces bien, una vez es suficiente"



On Tue, 7 Mar 2023 at 11:50, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>
> > A quick hack that fixes the problem is to pass the current
> > MergeActionState to ExecGetUpdateNewTuple(), which also then requires
> > that it be passed to ExecBRUpdateTriggers(). Changing the signature of
> > 2 public functions in that way in back branches doesn't seem very nice
> > though.
> >
> > OTOH, it's difficult to see how it can be fixed any other way.
> >
> > I don't really like this change to ExecGetUpdateNewTuple(), but if we
> > are going to change it like this, then I think we should go all the
> > way and get rid of the GetUpdateNewTuple callback, and the separate
> > internalGetUpdateNewTuple() and mergeGetUpdateNewTuple() functions,
> > and just do it all in ExecGetUpdateNewTuple(), so that all the code is
> > in one place, making it easier to follow.
>
> I'm pretty sure that I split things this way (using a callback) because
> the MERGE code was at arms-length, being in a different file, and there
> was no reasonable way, without duplicating a lot of other code, to
> implement the different behavior other than using a callback.
>
> I then moved the MERGE code from a separate file into nodeModifyTable.c
> and could have put it back in one piece, but didn't remember that it was
> there.  It even looks like doing that may even make the code simpler, so
> perhaps we should just do that -- nodeModifyTable.c if fully aware of
> MERGE now, so it's not a problem.
>

So I wrote a patch doing that, and added some isolation tests
confirming that it fixes the reported issue.

Unfortunately, I think this exposes a different issue -- if the target
table has a BEFORE ROW trigger, and the target tuple is concurrently
updated, the trigger code will lock the updated tuple, and the merge
code won't see it as a concurrent update, and so it won't re-check the
WHEN condition. That leads to some quite surprising results, as shown
in the new isolation test cases.

This makes me wonder if perhaps the merge code ought to lock the
target tuple before checking WHEN conditions, in the same way that the
trigger code does before checking trigger WHEN clauses. If it did
that, the trigger code would never have to re-compute the new tuple,
so this change to  ExecBRUpdateTriggers() wouldn't be necessary. It
would also probably resolve the issue with cross-partition updates and
concurrently deleted tuples over on [1] and I think the requirement
for cpUpdateRetrySlot would go away. That seems like a pretty invasive
set of changes though.

[1] https://www.postgresql.org/message-id/20230215194224.egfyt3j5ewy4c6m3%40alvherre.pgsql

Regards,
Dean

Attachment
On Thu, 9 Mar 2023 at 13:11, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
>
> So I wrote a patch doing that, and added some isolation tests
> confirming that it fixes the reported issue.
>
> Unfortunately, I think this exposes a different issue -- if the target
> table has a BEFORE ROW trigger, and the target tuple is concurrently
> updated, the trigger code will lock the updated tuple, and the merge
> code won't see it as a concurrent update, and so it won't re-check the
> WHEN condition. That leads to some quite surprising results, as shown
> in the new isolation test cases.
>

I have been thinking about this some more, and I added a bunch of
additional isolation tests to test MERGE UPDATE/DELETE against
concurrent UPDATE/DELETE for normal tables, tables with BEFORE ROW
triggers, and partitioned tables where the MERGE does a
cross-partition update. Various of those tests currently fail, either
with the crash reported in this thread, or because they execute the
wrong merge action, or no action at all.

To fix this, it's necessary to pass the TM_Result and TM_FailureData
results from the trigger code's attempt to lock the old tuple, and
from the cross-partition-update code's attempt to delete the old
tuple, up to the merge code. This allows the merge code to handle
those cases in the same way as it does for non-partitioned tables
without triggers.

Overall, this leads to a few simplifications in nodeModifyTable.c:

1). ModifyTableContext->GetUpdateNewTuple() is now never called for
MERGE, so it and mergeGetUpdateNewTuple() can be deleted, and
ExecGetUpdateNewTuple() can be restored to how it was in v14.

2). ModifyTableContext->cpUpdateRetrySlot is no longer used by MERGE,
so it can also be deleted, and ExecCrossPartitionUpdate() can just
return retry_slot directly to ExecUpdateAct(), as it used to do.

3). ExecMergeMatched() becomes a bit simpler, since it no longer needs
special-case code for cross-partition updates.

Overall, this removes around 65 lines from nodeModifyTable.c, but more
importantly, it seems to fix the concurrent update issues.

Attached is an updated patch that I'm more happy with now, and a
slightly modified one for v15, keeping the trigger API
backwards-compatible for extensions.

Regards,
Dean

Attachment
On Sat, 11 Mar 2023 at 18:12, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
>
> Attached is an updated patch that I'm more happy with now, and a
> slightly modified one for v15, keeping the trigger API
> backwards-compatible for extensions.
>

Going over this again, I spotted another couple of things:

Passing recheckIndexes to ExecUpdateEpilogue() is pointless, since
that's something that's only ever computed and used locally within
ExecUpdateEpilogue(). It's also pointless for the caller to
list_free() it, since the caller only has a pointer to the empty list,
not the list built in ExecUpdateEpilogue(), which it should free. This
is pretty harmless, but I might as well tidy it up while hacking on
this.

If a BEFORE ROW trigger returns NULL to skip an update or delete,
ExecMergeMatched() will still update estate->es_processed, and so the
final row count in the command tag will be wrong. I'll tackle that in
a follow-on patch, since it's really a separate bug.

Regards,
Dean

Attachment
On Sun, 12 Mar 2023 at 11:14, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
>
> On Sat, 11 Mar 2023 at 18:12, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
> >
> > Attached is an updated patch that I'm more happy with now, and a
> > slightly modified one for v15, keeping the trigger API
> > backwards-compatible for extensions.
>
> If a BEFORE ROW trigger returns NULL to skip an update or delete,
> ExecMergeMatched() will still update estate->es_processed, and so the
> final row count in the command tag will be wrong. I'll tackle that in
> a follow-on patch, since it's really a separate bug.
>

OK, I have pushed fixes for both those issues.

Regards,
Dean