Thread: BUG #18830: ExecInitMerge Segfault on MERGE
The following bug has been logged on the website: Bug reference: 18830 Logged by: Robins Tharakan Email address: tharakan@gmail.com PostgreSQL version: Unsupported/Unknown Operating system: Ubuntu Description: With this SQL (backtraces below), postgres starts to segfault starting at cbc127917e. SQL === CREATE AGGREGATE d(double precision ORDER BY anyelement) ( SFUNC = ordered_set_transition_multi, STYPE = internal, FINALFUNC = rank_final); CREATE TABLE e ( a text, b integer ) PARTITION BY LIST (a); CREATE TABLE f ( a text, b integer ); CREATE TABLE g ( a text, b integer ); CREATE VIEW h AS SELECT XMLSERIALIZE(DOCUMENT '<foo><bar>42</bar></foo>'AS text ); ALTER TABLE e ATTACH PARTITION f FOR VALUES IN ('a'); ALTER TABLE e ATTACH PARTITION g FOR VALUES IN ('b'); MERGE INTO e USING h ON a = xmlserialize WHEN NOT MATCHED THEN INSERT VALUES (CAST(NULL AS text)); SQL Output ========== $ psql -p 9999 postgres -f ../sqith/repro_final.sql CREATE AGGREGATE CREATE TABLE CREATE TABLE CREATE TABLE CREATE VIEW ALTER TABLE ALTER TABLE psql:../sqith/repro_final.sql:25: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. psql:../sqith/repro_final.sql:25: error: connection to server was lost Error Log ========= 2025-03-03 06:55:54.014 ACDT [137463] LOG: client backend (PID 137479) was terminated by signal 11: Segmentation fault 2025-03-03 06:55:54.014 ACDT [137463] DETAIL: Failed process was running: MERGE INTO e USING h ON a = xmlserialize WHEN NOT MATCHED THEN INSERT VALUES (CAST(NULL AS text)); 2025-03-03 06:55:54.014 ACDT [137463] LOG: terminating any other active server processes Commit ====== Checking (15a79c73111~0) - 15a79c7311 - fail Checking (15a79c73111~10) - 424ededc58 - fail Checking (15a79c73111~30) - 945a9e3832 - fail Checking (15a79c73111~70) - a4e986ef5a - fail Checking (15a79c73111~150) - 6a2275b895 - fail Checking (15a79c73111~310) - 117f9f328e - Pass Checking (15a79c73111~230) - c89525d57b - Pass Checking (15a79c73111~190) - c366d2bdba - fail Checking (15a79c73111~210) - fb056564ec - fail Checking (15a79c73111~220) - 44ec095751 - Pass Checking (15a79c73111~215) - cbc127917e - fail Checking (15a79c73111~217) - 428fadb7e9 - Pass Checking (15a79c73111~216) - 926c7fce03 - Pass Offending Commit is cbc127917e - HEAD~215 BackTrace ========= (gdb) bt #0 ExecInitMerge (mtstate=0x599c6b260330, estate=0x599c6b260080) at nodeModifyTable.c:3663 #1 0x0000599c2ca1ce11 in ExecInitModifyTable (node=0x599c6b282e08, estate=0x599c6b260080, eflags=0) at nodeModifyTable.c:4889 #2 0x0000599c2c9d63a3 in ExecInitNode (node=0x599c6b282e08, estate=0x599c6b260080, eflags=0) at execProcnode.c:177 #3 0x0000599c2c9cac66 in InitPlan (queryDesc=0x599c6b218ba0, eflags=0) at execMain.c:985 #4 0x0000599c2c9c99f0 in standard_ExecutorStart (queryDesc=0x599c6b218ba0, eflags=0) at execMain.c:259 #5 0x0000599c2c9c96fb in ExecutorStart (queryDesc=0x599c6b218ba0, eflags=0) at execMain.c:135 #6 0x0000599c2ccb3925 in ProcessQuery (plan=0x599c6b284188, sourceText=0x599c6b16c4e0 "MERGE INTO e USING h ON a = xmlserialize WHEN NOT MATCHED THEN INSERT VALUES (CAST(NULL AS text));", params=0x0, queryEnv=0x0, dest=0x599c6b284308, qc=0x7ffc8bc1aaf0) at pquery.c:155 #7 0x0000599c2ccb546e in PortalRunMulti (portal=0x599c6b1ef0a0, isTopLevel=true, setHoldSnapshot=false, dest=0x599c6b284308, altdest=0x599c6b284308, qc=0x7ffc8bc1aaf0) at pquery.c:1271 #8 0x0000599c2ccb497a in PortalRun (portal=0x599c6b1ef0a0, count=9223372036854775807, isTopLevel=true, dest=0x599c6b284308, altdest=0x599c6b284308, qc=0x7ffc8bc1aaf0) at pquery.c:787 #9 0x0000599c2ccad440 in exec_simple_query (query_string=0x599c6b16c4e0 "MERGE INTO e USING h ON a = xmlserialize WHEN NOT MATCHED THEN INSERT VALUES (CAST(NULL AS text));") at postgres.c:1271 #10 0x0000599c2ccb27ef in PostgresMain (dbname=0x599c6b1a6680 "smithreduce", username=0x599c6b1a6668 "smith") at postgres.c:4691 BackTrace Full ============== #0 ExecInitMerge (mtstate=0x599c6b260330, estate=0x599c6b260080) at nodeModifyTable.c:3663 mergeActionList = 0x599c6b281b70 joinCondition = 0x0 relationDesc = 0x599c2c9df7cb <ExecTypeFromTL+33> l = 0x100000000 lc__state = {l = 0x599c6b281bc0, i = 0} node = 0x599c6b282e08 rootRelInfo = 0x599c6b260f98 resultRelInfo = 0x599c6b260f80 econtext = 0x599c6b2695f8 lc = 0x599c6b281bd8 i = 1 __func__ = "ExecInitMerge" #1 0x0000599c2ca1ce11 in ExecInitModifyTable (node=0x599c6b282e08, estate=0x599c6b260080, eflags=0) at nodeModifyTable.c:4889 mtstate = 0x599c6b260330 subplan = 0x599c6b282d78 operation = CMD_MERGE nrels = 0 resultRelations = 0x0 withCheckOptionLists = 0x0 returningLists = 0x0 updateColnosLists = 0x0 resultRelInfo = 0x599c6b260f80 arowmarks = 0x0 l = 0x0 i = 0 rel = 0x7be2ba49e368 __func__ = "ExecInitModifyTable" #2 0x0000599c2c9d63a3 in ExecInitNode (node=0x599c6b282e08, estate=0x599c6b260080, eflags=0) at execProcnode.c:177 result = 0x0 subps = 0x599c6b282760 l = 0x599c6b2608c0 __func__ = "ExecInitNode" #3 0x0000599c2c9cac66 in InitPlan (queryDesc=0x599c6b218ba0, eflags=0) at execMain.c:985 operation = CMD_MERGE plannedstmt = 0x599c6b284188 plan = 0x599c6b282e08 rangeTable = 0x599c6b283228 estate = 0x599c6b260080 planstate = 0x599c6b1ab310 tupType = 0x599c6b2011f0 l = 0x0 i = 1 __func__ = "InitPlan" This (and bug #18828) found using SQLSmith / SQLReduce / creduce.
PG Bug reporting form <noreply@postgresql.org> 于2025年3月3日周一 18:31写道:
The following bug has been logged on the website:
Bug reference: 18830
Logged by: Robins Tharakan
Email address: tharakan@gmail.com
PostgreSQL version: Unsupported/Unknown
Operating system: Ubuntu
Description:
With this SQL (backtraces below), postgres starts to segfault starting at
cbc127917e.
SQL
===
CREATE AGGREGATE d(double precision ORDER BY anyelement) (
SFUNC = ordered_set_transition_multi,
STYPE = internal,
FINALFUNC = rank_final);
CREATE TABLE e (
a text,
b integer
) PARTITION BY LIST (a);
CREATE TABLE f (
a text,
b integer
);
CREATE TABLE g (
a text,
b integer
);
CREATE VIEW h AS
SELECT XMLSERIALIZE(DOCUMENT '<foo><bar>42</bar></foo>'AS text );
ALTER TABLE e ATTACH PARTITION f FOR VALUES IN ('a');
ALTER TABLE e ATTACH PARTITION g FOR VALUES IN ('b');
MERGE INTO e USING h ON a = xmlserialize WHEN NOT MATCHED THEN INSERT
VALUES (CAST(NULL AS text));
SQL Output
==========
$ psql -p 9999 postgres -f ../sqith/repro_final.sql
CREATE AGGREGATE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE VIEW
ALTER TABLE
ALTER TABLE
psql:../sqith/repro_final.sql:25: server closed the connection
unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
psql:../sqith/repro_final.sql:25: error: connection to server was lost
Error Log
=========
2025-03-03 06:55:54.014 ACDT [137463] LOG: client backend (PID 137479) was
terminated by signal 11: Segmentation fault
2025-03-03 06:55:54.014 ACDT [137463] DETAIL: Failed process was running:
MERGE INTO e USING h ON a = xmlserialize WHEN NOT MATCHED THEN INSERT
VALUES (CAST(NULL AS text));
2025-03-03 06:55:54.014 ACDT [137463] LOG: terminating any other active
server processes
Commit
======
Checking (15a79c73111~0) - 15a79c7311 - fail
Checking (15a79c73111~10) - 424ededc58 - fail
Checking (15a79c73111~30) - 945a9e3832 - fail
Checking (15a79c73111~70) - a4e986ef5a - fail
Checking (15a79c73111~150) - 6a2275b895 - fail
Checking (15a79c73111~310) - 117f9f328e - Pass
Checking (15a79c73111~230) - c89525d57b - Pass
Checking (15a79c73111~190) - c366d2bdba - fail
Checking (15a79c73111~210) - fb056564ec - fail
Checking (15a79c73111~220) - 44ec095751 - Pass
Checking (15a79c73111~215) - cbc127917e - fail
Checking (15a79c73111~217) - 428fadb7e9 - Pass
Checking (15a79c73111~216) - 926c7fce03 - Pass
Offending Commit is cbc127917e - HEAD~215
BackTrace
=========
(gdb) bt
#0 ExecInitMerge (mtstate=0x599c6b260330, estate=0x599c6b260080) at
nodeModifyTable.c:3663
#1 0x0000599c2ca1ce11 in ExecInitModifyTable (node=0x599c6b282e08,
estate=0x599c6b260080, eflags=0) at nodeModifyTable.c:4889
#2 0x0000599c2c9d63a3 in ExecInitNode (node=0x599c6b282e08,
estate=0x599c6b260080, eflags=0) at execProcnode.c:177
#3 0x0000599c2c9cac66 in InitPlan (queryDesc=0x599c6b218ba0, eflags=0) at
execMain.c:985
#4 0x0000599c2c9c99f0 in standard_ExecutorStart (queryDesc=0x599c6b218ba0,
eflags=0) at execMain.c:259
#5 0x0000599c2c9c96fb in ExecutorStart (queryDesc=0x599c6b218ba0, eflags=0)
at execMain.c:135
#6 0x0000599c2ccb3925 in ProcessQuery (plan=0x599c6b284188,
sourceText=0x599c6b16c4e0 "MERGE INTO e USING h ON a = xmlserialize
WHEN NOT MATCHED THEN INSERT VALUES (CAST(NULL AS text));", params=0x0,
queryEnv=0x0, dest=0x599c6b284308,
qc=0x7ffc8bc1aaf0) at pquery.c:155
#7 0x0000599c2ccb546e in PortalRunMulti (portal=0x599c6b1ef0a0,
isTopLevel=true, setHoldSnapshot=false, dest=0x599c6b284308,
altdest=0x599c6b284308, qc=0x7ffc8bc1aaf0)
at pquery.c:1271
#8 0x0000599c2ccb497a in PortalRun (portal=0x599c6b1ef0a0,
count=9223372036854775807, isTopLevel=true, dest=0x599c6b284308,
altdest=0x599c6b284308, qc=0x7ffc8bc1aaf0) at pquery.c:787
#9 0x0000599c2ccad440 in exec_simple_query (query_string=0x599c6b16c4e0
"MERGE INTO e USING h ON a = xmlserialize WHEN NOT MATCHED THEN INSERT
VALUES (CAST(NULL AS text));")
at postgres.c:1271
#10 0x0000599c2ccb27ef in PostgresMain (dbname=0x599c6b1a6680 "smithreduce",
username=0x599c6b1a6668 "smith") at postgres.c:4691
BackTrace Full
==============
#0 ExecInitMerge (mtstate=0x599c6b260330, estate=0x599c6b260080) at
nodeModifyTable.c:3663
mergeActionList = 0x599c6b281b70
joinCondition = 0x0
relationDesc = 0x599c2c9df7cb <ExecTypeFromTL+33>
l = 0x100000000
lc__state = {l = 0x599c6b281bc0, i = 0}
node = 0x599c6b282e08
rootRelInfo = 0x599c6b260f98
resultRelInfo = 0x599c6b260f80
econtext = 0x599c6b2695f8
lc = 0x599c6b281bd8
i = 1
__func__ = "ExecInitMerge"
#1 0x0000599c2ca1ce11 in ExecInitModifyTable (node=0x599c6b282e08,
estate=0x599c6b260080, eflags=0) at nodeModifyTable.c:4889
mtstate = 0x599c6b260330
subplan = 0x599c6b282d78
operation = CMD_MERGE
nrels = 0
resultRelations = 0x0
withCheckOptionLists = 0x0
returningLists = 0x0
updateColnosLists = 0x0
resultRelInfo = 0x599c6b260f80
arowmarks = 0x0
l = 0x0
i = 0
rel = 0x7be2ba49e368
__func__ = "ExecInitModifyTable"
#2 0x0000599c2c9d63a3 in ExecInitNode (node=0x599c6b282e08,
estate=0x599c6b260080, eflags=0) at execProcnode.c:177
result = 0x0
subps = 0x599c6b282760
l = 0x599c6b2608c0
__func__ = "ExecInitNode"
#3 0x0000599c2c9cac66 in InitPlan (queryDesc=0x599c6b218ba0, eflags=0) at
execMain.c:985
operation = CMD_MERGE
plannedstmt = 0x599c6b284188
plan = 0x599c6b282e08
rangeTable = 0x599c6b283228
estate = 0x599c6b260080
planstate = 0x599c6b1ab310
tupType = 0x599c6b2011f0
l = 0x0
i = 1
__func__ = "InitPlan"
This (and bug #18828) found using SQLSmith / SQLReduce / creduce.
Thanks for reporting.
I can reproduce this crash. On HEAD 3f1db99bfa, crash on a different place due to the commit 75dfde1.
But cbc127917e is the root cause commit.
MERGE INTO e USING h ON a = xmlserialize WHEN NOT MATCHED THEN INSERT
VALUES (CAST(NULL AS text));
VALUES (CAST(NULL AS text));
The plan of the above query looks as below:
QUERY PLAN
------------------------------------------------------------------
Merge on e (cost=0.00..58.29 rows=0 width=0)
-> Nested Loop Left Join (cost=0.00..58.29 rows=12 width=10)
-> Result (cost=0.00..0.01 rows=1 width=0)
-> Append (cost=0.00..58.16 rows=12 width=10)
Subplans Removed: 2
------------------------------------------------------------------
Merge on e (cost=0.00..58.29 rows=0 width=0)
-> Nested Loop Left Join (cost=0.00..58.29 rows=12 width=10)
-> Result (cost=0.00..0.01 rows=1 width=0)
-> Append (cost=0.00..58.16 rows=12 width=10)
Subplans Removed: 2
You can see that all partitions are pruned. After cbc127917e, we only consider unpruned relations, then the list resultRelations is empty.
the estate->es_unpruned_relids contains (1,2), the node->resultRelations contains (5,6).
nrels = list_length(resultRelations);
...
mtstate->resultRelInfo = (ResultRelInfo *)
palloc(nrels * sizeof(ResultRelInfo));
palloc(nrels * sizeof(ResultRelInfo));
The memory of mtstate->resultRelInfo point to is undefined. When we access its memory in ExecInitMerge(),
relationDesc = RelationGetDescr(resultRelInfo->ri_RelationDesc);
crash happened.
After 75dfde1, mergeActionLists list is empty, so in ExecInitMerge() we don't enter the foreach(lc, mergeActionLists),
and the mtstate->ps.ps_ExprContext has no chance to initialize. So in ExecMergeNotMatched(), econtext is NULL.
econtext->ecxt_scantuple = NULL;
The above statement can trigger a segment fault.
For Merge command NOT MATCH, should we need the logic that only consider unpruned relations in ExecInitModifyTable()?
Merge command seems more complex than update and delete. Can we consider the unpruned relations in ExecInitModifyTable()
according to the command type. For merge, we do the logic before cbc127917e, for now?
Thanks,
Tender Wang
Tender Wang <tndrwang@gmail.com> 于2025年3月3日周一 23:46写道:
Thanks for reporting.I can reproduce this crash. On HEAD 3f1db99bfa, crash on a different place due to the commit 75dfde1.But cbc127917e is the root cause commit.MERGE INTO e USING h ON a = xmlserialize WHEN NOT MATCHED THEN INSERT
VALUES (CAST(NULL AS text));The plan of the above query looks as below:QUERY PLAN
------------------------------------------------------------------
Merge on e (cost=0.00..58.29 rows=0 width=0)
-> Nested Loop Left Join (cost=0.00..58.29 rows=12 width=10)
-> Result (cost=0.00..0.01 rows=1 width=0)
-> Append (cost=0.00..58.16 rows=12 width=10)
Subplans Removed: 2You can see that all partitions are pruned. After cbc127917e, we only consider unpruned relations, then the list resultRelations is empty.the estate->es_unpruned_relids contains (1,2), the node->resultRelations contains (5,6).nrels = list_length(resultRelations);...mtstate->resultRelInfo = (ResultRelInfo *)
palloc(nrels * sizeof(ResultRelInfo));The memory of mtstate->resultRelInfo point to is undefined. When we access its memory in ExecInitMerge(),relationDesc = RelationGetDescr(resultRelInfo->ri_RelationDesc);crash happened.After 75dfde1, mergeActionLists list is empty, so in ExecInitMerge() we don't enter the foreach(lc, mergeActionLists),and the mtstate->ps.ps_ExprContext has no chance to initialize. So in ExecMergeNotMatched(), econtext is NULL.econtext->ecxt_scantuple = NULL;The above statement can trigger a segment fault.For Merge command NOT MATCH, should we need the logic that only consider unpruned relations in ExecInitModifyTable()?Merge command seems more complex than update and delete. Can we consider the unpruned relations in ExecInitModifyTable()according to the command type. For merge, we do the logic before cbc127917e, for now?
I found the partition_prune.sql does not cover merge into ... not match case, and I found an easy reproduce step, seeing below:
using (select stable_one() + 2 as pid) as q join part_abc_1 pt1 on (true) on pt.a = stable_one() +2
when not matched then insert values(1, 'd', false);
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
Thanks,
Tender Wang
On Tue, 4 Mar 2025 at 19:58, Tender Wang <tndrwang@gmail.com> wrote: > I found the partition_prune.sql does not cover merge into ... not match case, and I found an easy reproduce step, seeingbelow: > > postgres=# merge into part_abc_view pt > using (select stable_one() + 2 as pid) as q join part_abc_1 pt1 on (true) on pt.a = stable_one() +2 > when not matched then insert values(1, 'd', false); > server closed the connection unexpectedly > This probably means the server terminated abnormally > before or while processing the request. > The connection to the server was lost. Attempting reset: Succeeded. It looks like this is happening because ExecInitModifyTable() skips adding mergeActionsList items when bms_is_member(rti, estate->es_unpruned_relids) is false for all resultRelations. This results in an empty actions list. Because the MERGE is performing a LEFT JOIN for the NOT MATCHED, ExecMerge() gets a row and runs ExecMergeNotMatched(), which crashes on "econtext->ecxt_scantuple = NULL;" because of a NULL econtext. econtext is NULL because ExecInitMerge() skips calling ExecAssignExprContext() when mergeActionLists is empty. There are a couple of ways I can see to fix this, 1) would be to move the ExecAssignExprContext() above the "if (mergeActionLists == NIL)" in ExecInitMerge(), or 2) add code to return NULL in ExecMergeNotMatched() if actionStates is NULL. I think maybe #1 is the better option as #2 adds additional code that executes on every ExecMergeNotMatched() call. The patch does #1. We should probably add your test case too. David
Attachment
On Tue, Mar 4, 2025 at 3:00 PM David Rowley <dgrowleyml@gmail.com> wrote: > On Tue, 4 Mar 2025 at 19:58, Tender Wang <tndrwang@gmail.com> wrote: > > I found the partition_prune.sql does not cover merge into ... not match case, and I found an easy reproduce step, seeing below: > > > > postgres=# merge into part_abc_view pt > > using (select stable_one() + 2 as pid) as q join part_abc_1 pt1 on (true) on pt.a = stable_one() +2 > > when not matched then insert values(1, 'd', false); > > server closed the connection unexpectedly > > This probably means the server terminated abnormally > > before or while processing the request. > > The connection to the server was lost. Attempting reset: Succeeded. > > It looks like this is happening because ExecInitModifyTable() skips > adding mergeActionsList items when bms_is_member(rti, > estate->es_unpruned_relids) is false for all resultRelations. This > results in an empty actions list. Because the MERGE is performing a > LEFT JOIN for the NOT MATCHED, ExecMerge() gets a row and runs > ExecMergeNotMatched(), which crashes on "econtext->ecxt_scantuple = > NULL;" because of a NULL econtext. econtext is NULL because > ExecInitMerge() skips calling ExecAssignExprContext() when > mergeActionLists is empty. > > There are a couple of ways I can see to fix this, 1) would be to move > the ExecAssignExprContext() above the "if (mergeActionLists == NIL)" > in ExecInitMerge(), or 2) add code to return NULL in > ExecMergeNotMatched() if actionStates is NULL. > > I think maybe #1 is the better option as #2 adds additional code that > executes on every ExecMergeNotMatched() call. The patch does #1. We > should probably add your test case too. Thanks for taking a look and the patch. I was thinking #1 too. Though I was thinking of doing the ps_ExprContext initialization in ExecInitModifyTable() before calling ExecInitMerge(), to be similar to what we do for other ModifyTable properties that need one. -- Thanks, Amit Langote
David Rowley <dgrowleyml@gmail.com> 于2025年3月4日周二 17:30写道:
On Tue, 4 Mar 2025 at 19:58, Tender Wang <tndrwang@gmail.com> wrote:
> I found the partition_prune.sql does not cover merge into ... not match case, and I found an easy reproduce step, seeing below:
>
> postgres=# merge into part_abc_view pt
> using (select stable_one() + 2 as pid) as q join part_abc_1 pt1 on (true) on pt.a = stable_one() +2
> when not matched then insert values(1, 'd', false);
> server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> The connection to the server was lost. Attempting reset: Succeeded.
It looks like this is happening because ExecInitModifyTable() skips
adding mergeActionsList items when bms_is_member(rti,
estate->es_unpruned_relids) is false for all resultRelations. This
results in an empty actions list. Because the MERGE is performing a
LEFT JOIN for the NOT MATCHED, ExecMerge() gets a row and runs
ExecMergeNotMatched(), which crashes on "econtext->ecxt_scantuple =
NULL;" because of a NULL econtext. econtext is NULL because
ExecInitMerge() skips calling ExecAssignExprContext() when
mergeActionLists is empty.
There are a couple of ways I can see to fix this, 1) would be to move
the ExecAssignExprContext() above the "if (mergeActionLists == NIL)"
in ExecInitMerge(), or 2) add code to return NULL in
ExecMergeNotMatched() if actionStates is NULL.
I think maybe #1 is the better option as #2 adds additional code that
executes on every ExecMergeNotMatched() call. The patch does #1. We
should probably add your test case too.
Hmm, apply your patch, I get different results when set enable_partition_pruning = off, seeing below:
postgres=# merge into part_abc_view ptusing (select stable_one() + 2 as pid) as q join part_abc_1 pt1 on (true)
on pt.a = stable_one() +2
when not matched then insert values(1, 'd', false);
MERGE 0
postgres=# set enable_partition_pruning = off;
SET
postgres=# merge into part_abc_view pt
using (select stable_one() + 2 as pid) as q join part_abc_1 pt1 on (true)
on pt.a = stable_one() +2
when not matched then insert values(1, 'd', false);
MERGE 1
if all partitions are pruned, the resultRelation's mergeActions has no chance to be set.
Thanks,
Tender Wang
On Tue, Mar 4, 2025 at 3:16 PM Tender Wang <tndrwang@gmail.com> wrote: > David Rowley <dgrowleyml@gmail.com> 于2025年3月4日周二 17:30写道: >> >> On Tue, 4 Mar 2025 at 19:58, Tender Wang <tndrwang@gmail.com> wrote: >> > I found the partition_prune.sql does not cover merge into ... not match case, and I found an easy reproduce step, seeing below: >> > >> > postgres=# merge into part_abc_view pt >> > using (select stable_one() + 2 as pid) as q join part_abc_1 pt1 on (true) on pt.a = stable_one() +2 >> > when not matched then insert values(1, 'd', false); >> > server closed the connection unexpectedly >> > This probably means the server terminated abnormally >> > before or while processing the request. >> > The connection to the server was lost. Attempting reset: Succeeded. >> >> It looks like this is happening because ExecInitModifyTable() skips >> adding mergeActionsList items when bms_is_member(rti, >> estate->es_unpruned_relids) is false for all resultRelations. This >> results in an empty actions list. Because the MERGE is performing a >> LEFT JOIN for the NOT MATCHED, ExecMerge() gets a row and runs >> ExecMergeNotMatched(), which crashes on "econtext->ecxt_scantuple = >> NULL;" because of a NULL econtext. econtext is NULL because >> ExecInitMerge() skips calling ExecAssignExprContext() when >> mergeActionLists is empty. >> >> There are a couple of ways I can see to fix this, 1) would be to move >> the ExecAssignExprContext() above the "if (mergeActionLists == NIL)" >> in ExecInitMerge(), or 2) add code to return NULL in >> ExecMergeNotMatched() if actionStates is NULL. >> >> I think maybe #1 is the better option as #2 adds additional code that >> executes on every ExecMergeNotMatched() call. The patch does #1. We >> should probably add your test case too. >> > > Hmm, apply your patch, I get different results when set enable_partition_pruning = off, seeing below: > postgres=# merge into part_abc_view pt > using (select stable_one() + 2 as pid) as q join part_abc_1 pt1 on (true) > on pt.a = stable_one() +2 > when not matched then insert values(1, 'd', false); > MERGE 0 > postgres=# set enable_partition_pruning = off; > SET > postgres=# merge into part_abc_view pt > using (select stable_one() + 2 as pid) as q join part_abc_1 pt1 on (true) > on pt.a = stable_one() +2 > when not matched then insert values(1, 'd', false); > MERGE 1 Hmm, interesting. Can you share the full test case? What's the behavior on v17 and older? Just want to be sure if we're looking at another bug in the code committed in v18. -- Thanks, Amit Langote
Amit Langote <amitlangote09@gmail.com> 于2025年3月4日周二 18:10写道:
On Tue, Mar 4, 2025 at 3:16 PM Tender Wang <tndrwang@gmail.com> wrote:
> David Rowley <dgrowleyml@gmail.com> 于2025年3月4日周二 17:30写道:
>>
>> On Tue, 4 Mar 2025 at 19:58, Tender Wang <tndrwang@gmail.com> wrote:
>> > I found the partition_prune.sql does not cover merge into ... not match case, and I found an easy reproduce step, seeing below:
>> >
>> > postgres=# merge into part_abc_view pt
>> > using (select stable_one() + 2 as pid) as q join part_abc_1 pt1 on (true) on pt.a = stable_one() +2
>> > when not matched then insert values(1, 'd', false);
>> > server closed the connection unexpectedly
>> > This probably means the server terminated abnormally
>> > before or while processing the request.
>> > The connection to the server was lost. Attempting reset: Succeeded.
>>
>> It looks like this is happening because ExecInitModifyTable() skips
>> adding mergeActionsList items when bms_is_member(rti,
>> estate->es_unpruned_relids) is false for all resultRelations. This
>> results in an empty actions list. Because the MERGE is performing a
>> LEFT JOIN for the NOT MATCHED, ExecMerge() gets a row and runs
>> ExecMergeNotMatched(), which crashes on "econtext->ecxt_scantuple =
>> NULL;" because of a NULL econtext. econtext is NULL because
>> ExecInitMerge() skips calling ExecAssignExprContext() when
>> mergeActionLists is empty.
>>
>> There are a couple of ways I can see to fix this, 1) would be to move
>> the ExecAssignExprContext() above the "if (mergeActionLists == NIL)"
>> in ExecInitMerge(), or 2) add code to return NULL in
>> ExecMergeNotMatched() if actionStates is NULL.
>>
>> I think maybe #1 is the better option as #2 adds additional code that
>> executes on every ExecMergeNotMatched() call. The patch does #1. We
>> should probably add your test case too.
>>
>
> Hmm, apply your patch, I get different results when set enable_partition_pruning = off, seeing below:
> postgres=# merge into part_abc_view pt
> using (select stable_one() + 2 as pid) as q join part_abc_1 pt1 on (true)
> on pt.a = stable_one() +2
> when not matched then insert values(1, 'd', false);
> MERGE 0
> postgres=# set enable_partition_pruning = off;
> SET
> postgres=# merge into part_abc_view pt
> using (select stable_one() + 2 as pid) as q join part_abc_1 pt1 on (true)
> on pt.a = stable_one() +2
> when not matched then insert values(1, 'd', false);
> MERGE 1
Hmm, interesting. Can you share the full test case?
What's the behavior on v17 and older? Just want to be sure if we're
looking at another bug in the code committed in v18.
The test case comes from partition_prune.sql, but now we don't cover merge .. not match.
create table part_abc_1 (b text, a int, c bool);
create table part_abc_2 (a int, c bool, b text);
alter table part_abc attach partition part_abc_1 for values in (1);
alter table part_abc attach partition part_abc_2 for values in (2);
insert into part_abc values (1, 'b', true);
insert into part_abc values (2, 'c', true);
create view part_abc_view as select * from part_abc where b <> 'a' with check option;
create function stable_one() returns int as $$ begin return 1; end; $$ language plpgsql stable;
merge into part_abc_view pt
using (select stable_one() + 2 as pid) as q join part_abc_1 pt1 on (true)
on pt.a = stable_one() +2
when not matched then insert values(1, 'd', false);
Thanks,
Tender Wang
Amit Langote <amitlangote09@gmail.com> 于2025年3月4日周二 18:10写道:
Hmm, interesting. Can you share the full test case?
What's the behavior on v17 and older? Just want to be sure if we're
looking at another bug in the code committed in v18.
I do test on 17.1, the results are consistent. See below:
Type "help" for help.
postgres=# create table part_abc (a int, b text, c bool) partition by list (a);
create table part_abc_1 (b text, a int, c bool);
create table part_abc_2 (a int, c bool, b text);
alter table part_abc attach partition part_abc_1 for values in (1);
alter table part_abc attach partition part_abc_2 for values in (2);
insert into part_abc values (1, 'b', true);
insert into part_abc values (2, 'c', true);
create view part_abc_view as select * from part_abc where b <> 'a' with check option;
create function stable_one() returns int as $$ begin return 1; end; $$ language plpgsql stable;
merge into part_abc_view pt
using (select stable_one() + 2 as pid) as q join part_abc_1 pt1 on (true)
on pt.a = stable_one() +2
when not matched then insert values(1, 'd', false);
CREATE TABLE
CREATE TABLE
CREATE TABLE
ALTER TABLE
ALTER TABLE
INSERT 0 1
INSERT 0 1
CREATE VIEW
CREATE FUNCTION
MERGE 1
postgres=# show enable_partition_pruning ;
enable_partition_pruning
--------------------------
on
(1 row)
postgres=# create database test;
CREATE DATABASE
postgres=# \c test
You are now connected to database "test" as user "ecs-user".
test=# set enable_partition_pruning = off;
SET
test=# show enable_partition_pruning ;
enable_partition_pruning
--------------------------
off
(1 row)
test=# create table part_abc (a int, b text, c bool) partition by list (a);
create table part_abc_1 (b text, a int, c bool);
create table part_abc_2 (a int, c bool, b text);
alter table part_abc attach partition part_abc_1 for values in (1);
alter table part_abc attach partition part_abc_2 for values in (2);
insert into part_abc values (1, 'b', true);
insert into part_abc values (2, 'c', true);
create view part_abc_view as select * from part_abc where b <> 'a' with check option;
create function stable_one() returns int as $$ begin return 1; end; $$ language plpgsql stable;
merge into part_abc_view pt
using (select stable_one() + 2 as pid) as q join part_abc_1 pt1 on (true)
on pt.a = stable_one() +2
when not matched then insert values(1, 'd', false);
CREATE TABLE
CREATE TABLE
CREATE TABLE
ALTER TABLE
ALTER TABLE
INSERT 0 1
INSERT 0 1
CREATE VIEW
CREATE FUNCTION
MERGE 1
Thanks,
Tender Wang
Amit Langote <amitlangote09@gmail.com> 于2025年3月4日周二 18:10写道:
On Tue, Mar 4, 2025 at 3:16 PM Tender Wang <tndrwang@gmail.com> wrote:
> David Rowley <dgrowleyml@gmail.com> 于2025年3月4日周二 17:30写道:
>>
>> On Tue, 4 Mar 2025 at 19:58, Tender Wang <tndrwang@gmail.com> wrote:
>> > I found the partition_prune.sql does not cover merge into ... not match case, and I found an easy reproduce step, seeing below:
>> >
>> > postgres=# merge into part_abc_view pt
>> > using (select stable_one() + 2 as pid) as q join part_abc_1 pt1 on (true) on pt.a = stable_one() +2
>> > when not matched then insert values(1, 'd', false);
>> > server closed the connection unexpectedly
>> > This probably means the server terminated abnormally
>> > before or while processing the request.
>> > The connection to the server was lost. Attempting reset: Succeeded.
>>
>> It looks like this is happening because ExecInitModifyTable() skips
>> adding mergeActionsList items when bms_is_member(rti,
>> estate->es_unpruned_relids) is false for all resultRelations. This
>> results in an empty actions list. Because the MERGE is performing a
>> LEFT JOIN for the NOT MATCHED, ExecMerge() gets a row and runs
>> ExecMergeNotMatched(), which crashes on "econtext->ecxt_scantuple =
>> NULL;" because of a NULL econtext. econtext is NULL because
>> ExecInitMerge() skips calling ExecAssignExprContext() when
>> mergeActionLists is empty.
>>
>> There are a couple of ways I can see to fix this, 1) would be to move
>> the ExecAssignExprContext() above the "if (mergeActionLists == NIL)"
>> in ExecInitMerge(), or 2) add code to return NULL in
>> ExecMergeNotMatched() if actionStates is NULL.
>>
>> I think maybe #1 is the better option as #2 adds additional code that
>> executes on every ExecMergeNotMatched() call. The patch does #1. We
>> should probably add your test case too.
>>
>
> Hmm, apply your patch, I get different results when set enable_partition_pruning = off, seeing below:
> postgres=# merge into part_abc_view pt
> using (select stable_one() + 2 as pid) as q join part_abc_1 pt1 on (true)
> on pt.a = stable_one() +2
> when not matched then insert values(1, 'd', false);
> MERGE 0
> postgres=# set enable_partition_pruning = off;
> SET
> postgres=# merge into part_abc_view pt
> using (select stable_one() + 2 as pid) as q join part_abc_1 pt1 on (true)
> on pt.a = stable_one() +2
> when not matched then insert values(1, 'd', false);
> MERGE 1
Hmm, interesting. Can you share the full test case?
What's the behavior on v17 and older? Just want to be sure if we're
looking at another bug in the code committed in v18.
Because all partitions are pruned, so bms_is_member(rti,
estate->es_unpruned_relids) is false, then mergeActionLists is empty.Even though we initialize econtext to not null, but in ExecMergeNotMatched(),
because actionStates list is empty, so no merge operation happens.
actionStates is empty due to empty mergeActionLists.
-- Thanks,
Tender Wang
On Tue, Mar 4, 2025 at 4:36 PM Tender Wang <tndrwang@gmail.com> wrote: > Amit Langote <amitlangote09@gmail.com> 于2025年3月4日周二 18:10写道: >> On Tue, Mar 4, 2025 at 3:16 PM Tender Wang <tndrwang@gmail.com> wrote: >> > David Rowley <dgrowleyml@gmail.com> 于2025年3月4日周二 17:30写道: >> >> There are a couple of ways I can see to fix this, 1) would be to move >> >> the ExecAssignExprContext() above the "if (mergeActionLists == NIL)" >> >> in ExecInitMerge(), or 2) add code to return NULL in >> >> ExecMergeNotMatched() if actionStates is NULL. >> >> >> >> I think maybe #1 is the better option as #2 adds additional code that >> >> executes on every ExecMergeNotMatched() call. The patch does #1. We >> >> should probably add your test case too. >> >> >> > Hmm, apply your patch, I get different results when set enable_partition_pruning = off, seeing below: >> > postgres=# merge into part_abc_view pt >> > using (select stable_one() + 2 as pid) as q join part_abc_1 pt1 on (true) >> > on pt.a = stable_one() +2 >> > when not matched then insert values(1, 'd', false); >> > MERGE 0 >> > postgres=# set enable_partition_pruning = off; >> > SET >> > postgres=# merge into part_abc_view pt >> > using (select stable_one() + 2 as pid) as q join part_abc_1 pt1 on (true) >> > on pt.a = stable_one() +2 >> > when not matched then insert values(1, 'd', false); >> > MERGE 1 >> >> Hmm, interesting. Can you share the full test case? >> >> What's the behavior on v17 and older? Just want to be sure if we're >> looking at another bug in the code committed in v18. > > > Because all partitions are pruned, so bms_is_member(rti, > estate->es_unpruned_relids) is false, then mergeActionLists is empty. > Even though we initialize econtext to not null, but in ExecMergeNotMatched(), > because actionStates list is empty, so no merge operation happens. > actionStates is empty due to empty mergeActionLists. Yes, I get that. What I am wondering about is whether the case you presented where disabling partition pruning gives a different result for the same query has anything to do with this particular bug that's causing the crash Robins reported. If there's a separate bug, we have to check if it is present in v17 and older. -- Thanks, Amit Langote
On Mon, Mar 3, 2025 at 9:16 PM Tender Wang <tndrwang@gmail.com> wrote: > nrels = list_length(resultRelations); > ... > mtstate->resultRelInfo = (ResultRelInfo *) > palloc(nrels * sizeof(ResultRelInfo)); > > The memory of mtstate->resultRelInfo point to is undefined. When we access its memory in ExecInitMerge(), This needs to be fixed saparately. > relationDesc = RelationGetDescr(resultRelInfo->ri_RelationDesc); > > crash happened. Do you have a case where this access to undefined ModifyTableState.resultRelInfo occurs? I would have thought that it should not happen. -- Thanks, Amit Langote
Amit Langote <amitlangote09@gmail.com> 于2025年3月4日周二 19:51写道:
On Mon, Mar 3, 2025 at 9:16 PM Tender Wang <tndrwang@gmail.com> wrote:
> nrels = list_length(resultRelations);
> ...
> mtstate->resultRelInfo = (ResultRelInfo *)
> palloc(nrels * sizeof(ResultRelInfo));
>
> The memory of mtstate->resultRelInfo point to is undefined. When we access its memory in ExecInitMerge(),
This needs to be fixed saparately.
> relationDesc = RelationGetDescr(resultRelInfo->ri_RelationDesc);
>
> crash happened.
Do you have a case where this access to undefined
ModifyTableState.resultRelInfo occurs? I would have thought that it
should not happen.
"undefined" may not be accurate, "invalid" seems more correct.
I still use the case:
merge into part_abc_view ptusing (select stable_one() + 2 as pid) as q join part_abc_1 pt1 on (true)
on pt.a = stable_one() +2
when not matched then insert values(1, 'd', false);
$5 = {type = 2139062142, ri_RangeTableIndex = 2139062143, ri_RelationDesc = 0x180, ri_NumIndices = 195, ri_IndexRelationDescs = 0x500000182, ri_IndexRelationInfo = 0x7f8773b2aa38, ri_RowIdAttNo = 0, ri_extraUpdatedCols = 0x0, ri_projectNew = 0x0,
ri_newTupleSlot = 0x0, ri_oldTupleSlot = 0x0, ri_projectNewInfoValid = false, ri_needLockTagTuple = false, ri_TrigDesc = 0x0, ri_TrigFunctions = 0x0, ri_TrigWhenExprs = 0x0, ri_TrigInstrument = 0x0, ri_ReturningSlot = 0x0, ri_TrigOldSlot = 0x0,
ri_TrigNewSlot = 0x0, ri_AllNullSlot = 0x0, ri_FdwRoutine = 0x0, ri_FdwState = 0x0, ri_usesFdwDirectModify = false, ri_NumSlots = 0, ri_NumSlotsInitialized = 0, ri_BatchSize = 0, ri_Slots = 0x0, ri_PlanSlots = 0x0, ri_WithCheckOptions = 0x0,
ri_WithCheckOptionExprs = 0x0, ri_ConstraintExprs = 0x0, ri_GeneratedExprsI = 0x0, ri_GeneratedExprsU = 0x0, ri_NumGeneratedNeededI = 0, ri_NumGeneratedNeededU = 0, ri_returningList = 0x0, ri_projectReturning = 0x0,
ri_onConflictArbiterIndexes = 0x0, ri_onConflict = 0x0, ri_MergeActions = {0x0, 0x0, 0x0}, ri_MergeJoinCondition = 0x0, ri_PartitionCheckExpr = 0x0, ri_ChildToRootMap = 0x0, ri_ChildToRootMapValid = false, ri_RootToChildMap = 0x0,
ri_RootToChildMapValid = false, ri_RootResultRelInfo = 0x0, ri_PartitionTupleSlot = 0x0, ri_CopyMultiInsertBuffer = 0x0, ri_ancestorResultRels = 0x0}
ri_RelationDesc = 0x180, this address is not invalid.
Thanks,
Tender Wang
On Tue, Mar 4, 2025 at 5:36 PM Tender Wang <tndrwang@gmail.com> wrote: > Amit Langote <amitlangote09@gmail.com> 于2025年3月4日周二 19:51写道: >> On Mon, Mar 3, 2025 at 9:16 PM Tender Wang <tndrwang@gmail.com> wrote: >> > nrels = list_length(resultRelations); >> > ... >> > mtstate->resultRelInfo = (ResultRelInfo *) >> > palloc(nrels * sizeof(ResultRelInfo)); >> > >> > The memory of mtstate->resultRelInfo point to is undefined. When we access its memory in ExecInitMerge(), >> >> This needs to be fixed saparately. >> >> > relationDesc = RelationGetDescr(resultRelInfo->ri_RelationDesc); >> > >> > crash happened. >> >> Do you have a case where this access to undefined >> ModifyTableState.resultRelInfo occurs? I would have thought that it >> should not happen. > > > "undefined" may not be accurate, "invalid" seems more correct. > I still use the case: > merge into part_abc_view pt > using (select stable_one() + 2 as pid) as q join part_abc_1 pt1 on (true) > on pt.a = stable_one() +2 > when not matched then insert values(1, 'd', false); > > (gdb) p *resultRelInfo > $5 = {type = 2139062142, ri_RangeTableIndex = 2139062143, ri_RelationDesc = 0x180, ri_NumIndices = 195, ri_IndexRelationDescs= 0x500000182, ri_IndexRelationInfo = 0x7f8773b2aa38, ri_RowIdAttNo = 0, ri_extraUpdatedCols = 0x0,ri_projectNew = 0x0, > ri_newTupleSlot = 0x0, ri_oldTupleSlot = 0x0, ri_projectNewInfoValid = false, ri_needLockTagTuple = false, ri_TrigDesc= 0x0, ri_TrigFunctions = 0x0, ri_TrigWhenExprs = 0x0, ri_TrigInstrument = 0x0, ri_ReturningSlot = 0x0, ri_TrigOldSlot= 0x0, > ri_TrigNewSlot = 0x0, ri_AllNullSlot = 0x0, ri_FdwRoutine = 0x0, ri_FdwState = 0x0, ri_usesFdwDirectModify = false, ri_NumSlots= 0, ri_NumSlotsInitialized = 0, ri_BatchSize = 0, ri_Slots = 0x0, ri_PlanSlots = 0x0, ri_WithCheckOptions = 0x0, > ri_WithCheckOptionExprs = 0x0, ri_ConstraintExprs = 0x0, ri_GeneratedExprsI = 0x0, ri_GeneratedExprsU = 0x0, ri_NumGeneratedNeededI= 0, ri_NumGeneratedNeededU = 0, ri_returningList = 0x0, ri_projectReturning = 0x0, > ri_onConflictArbiterIndexes = 0x0, ri_onConflict = 0x0, ri_MergeActions = {0x0, 0x0, 0x0}, ri_MergeJoinCondition = 0x0,ri_PartitionCheckExpr = 0x0, ri_ChildToRootMap = 0x0, ri_ChildToRootMapValid = false, ri_RootToChildMap = 0x0, > ri_RootToChildMapValid = false, ri_RootResultRelInfo = 0x0, ri_PartitionTupleSlot = 0x0, ri_CopyMultiInsertBuffer = 0x0,ri_ancestorResultRels = 0x0} > > ri_RelationDesc = 0x180, this address is not invalid. Does the query crash or are you just able to see invalid memory address using gdb? -- Thanks, Amit Langote
Amit Langote <amitlangote09@gmail.com> 于2025年3月4日周二 20:30写道:
On Tue, Mar 4, 2025 at 5:36 PM Tender Wang <tndrwang@gmail.com> wrote:
> Amit Langote <amitlangote09@gmail.com> 于2025年3月4日周二 19:51写道:
>> On Mon, Mar 3, 2025 at 9:16 PM Tender Wang <tndrwang@gmail.com> wrote:
>> > nrels = list_length(resultRelations);
>> > ...
>> > mtstate->resultRelInfo = (ResultRelInfo *)
>> > palloc(nrels * sizeof(ResultRelInfo));
>> >
>> > The memory of mtstate->resultRelInfo point to is undefined. When we access its memory in ExecInitMerge(),
>>
>> This needs to be fixed saparately.
>>
>> > relationDesc = RelationGetDescr(resultRelInfo->ri_RelationDesc);
>> >
>> > crash happened.
>>
>> Do you have a case where this access to undefined
>> ModifyTableState.resultRelInfo occurs? I would have thought that it
>> should not happen.
>
>
> "undefined" may not be accurate, "invalid" seems more correct.
> I still use the case:
> merge into part_abc_view pt
> using (select stable_one() + 2 as pid) as q join part_abc_1 pt1 on (true)
> on pt.a = stable_one() +2
> when not matched then insert values(1, 'd', false);
>
> (gdb) p *resultRelInfo
> $5 = {type = 2139062142, ri_RangeTableIndex = 2139062143, ri_RelationDesc = 0x180, ri_NumIndices = 195, ri_IndexRelationDescs = 0x500000182, ri_IndexRelationInfo = 0x7f8773b2aa38, ri_RowIdAttNo = 0, ri_extraUpdatedCols = 0x0, ri_projectNew = 0x0,
> ri_newTupleSlot = 0x0, ri_oldTupleSlot = 0x0, ri_projectNewInfoValid = false, ri_needLockTagTuple = false, ri_TrigDesc = 0x0, ri_TrigFunctions = 0x0, ri_TrigWhenExprs = 0x0, ri_TrigInstrument = 0x0, ri_ReturningSlot = 0x0, ri_TrigOldSlot = 0x0,
> ri_TrigNewSlot = 0x0, ri_AllNullSlot = 0x0, ri_FdwRoutine = 0x0, ri_FdwState = 0x0, ri_usesFdwDirectModify = false, ri_NumSlots = 0, ri_NumSlotsInitialized = 0, ri_BatchSize = 0, ri_Slots = 0x0, ri_PlanSlots = 0x0, ri_WithCheckOptions = 0x0,
> ri_WithCheckOptionExprs = 0x0, ri_ConstraintExprs = 0x0, ri_GeneratedExprsI = 0x0, ri_GeneratedExprsU = 0x0, ri_NumGeneratedNeededI = 0, ri_NumGeneratedNeededU = 0, ri_returningList = 0x0, ri_projectReturning = 0x0,
> ri_onConflictArbiterIndexes = 0x0, ri_onConflict = 0x0, ri_MergeActions = {0x0, 0x0, 0x0}, ri_MergeJoinCondition = 0x0, ri_PartitionCheckExpr = 0x0, ri_ChildToRootMap = 0x0, ri_ChildToRootMapValid = false, ri_RootToChildMap = 0x0,
> ri_RootToChildMapValid = false, ri_RootResultRelInfo = 0x0, ri_PartitionTupleSlot = 0x0, ri_CopyMultiInsertBuffer = 0x0, ri_ancestorResultRels = 0x0}
>
> ri_RelationDesc = 0x180, this address is not invalid.
Does the query crash or are you just able to see invalid memory
address using gdb?
git reset --hard cbc127917e
The query will crash on relationDesc = RelationGetDescr(resultRelInfo->ri_RelationDesc);
#0 ExecInitMerge (mtstate=0x55b08fc19a00, estate=0x55b08fc18c20) at nodeModifyTable.c:3663
3663 relationDesc = RelationGetDescr(resultRelInfo->ri_RelationDesc);
(gdb) bt
#0 ExecInitMerge (mtstate=0x55b08fc19a00, estate=0x55b08fc18c20) at nodeModifyTable.c:3663
#1 0x000055b07f1046af in ExecInitModifyTable (node=0x55b08fc3d338, estate=0x55b08fc18c20, eflags=0) at nodeModifyTable.c:4889
#2 0x000055b07f0bdd77 in ExecInitNode (node=0x55b08fc3d338, estate=0x55b08fc18c20, eflags=0) at execProcnode.c:177
#3 0x000055b07f0b26a9 in InitPlan (queryDesc=0x55b08fbe4990, eflags=0) at execMain.c:985
#4 0x000055b07f0b1435 in standard_ExecutorStart (queryDesc=0x55b08fbe4990, eflags=0) at execMain.c:259
#5 0x000055b07f0b1143 in ExecutorStart (queryDesc=0x55b08fbe4990, eflags=0) at execMain.c:135
#6 0x000055b07f395fab in ProcessQuery (plan=0x55b08fc3f0f8,
sourceText=0x55b08fb19420 "merge into part_abc_view pt\nusing (select stable_one() + 2 as pid) as q join part_abc_1 pt1 on (true)\non pt.a = stable_one() +2\nwhen not matched then insert values(1, 'd', false);", params=0x0, queryEnv=0x0,
dest=0x55b08fc3f278, qc=0x7ffc72bad230) at pquery.c:155
#7 0x000055b07f397af1 in PortalRunMulti (portal=0x55b08fb99c40, isTopLevel=true, setHoldSnapshot=false, dest=0x55b08fc3f278, altdest=0x55b08fc3f278, qc=0x7ffc72bad230) at pquery.c:1271
#8 0x000055b07f396ffd in PortalRun (portal=0x55b08fb99c40, count=9223372036854775807, isTopLevel=true, dest=0x55b08fc3f278, altdest=0x55b08fc3f278, qc=0x7ffc72bad230) at pquery.c:787
#9 0x000055b07f38fafd in exec_simple_query (
query_string=0x55b08fb19420 "merge into part_abc_view pt\nusing (select stable_one() + 2 as pid) as q join part_abc_1 pt1 on (true)\non pt.a = stable_one() +2\nwhen not matched then insert values(1, 'd', false);") at postgres.c:1271
#10 0x000055b07f394e65 in PostgresMain (dbname=0x55b08fb533e8 "postgres", username=0x55b08fb533c8 "ecs-user") at postgres.c:4691
#11 0x000055b07f38b71a in BackendMain (startup_data=0x7ffc72bad4b0 "", startup_data_len=4) at backend_startup.c:107
#12 0x000055b07f298873 in postmaster_child_launch (child_type=B_BACKEND, child_slot=1, startup_data=0x7ffc72bad4b0 "", startup_data_len=4, client_sock=0x7ffc72bad510) at launch_backend.c:274
#13 0x000055b07f29f1ce in BackendStartup (client_sock=0x7ffc72bad510) at postmaster.c:3519
#14 0x000055b07f29c7ec in ServerLoop () at postmaster.c:1688
#15 0x000055b07f29c0e4 in PostmasterMain (argc=3, argv=0x55b08fb13a30) at postmaster.c:1386
#16 0x000055b07f13fbd1 in main (argc=3, argv=0x55b08fb13a30) at main.c:230
3663 relationDesc = RelationGetDescr(resultRelInfo->ri_RelationDesc);
(gdb) bt
#0 ExecInitMerge (mtstate=0x55b08fc19a00, estate=0x55b08fc18c20) at nodeModifyTable.c:3663
#1 0x000055b07f1046af in ExecInitModifyTable (node=0x55b08fc3d338, estate=0x55b08fc18c20, eflags=0) at nodeModifyTable.c:4889
#2 0x000055b07f0bdd77 in ExecInitNode (node=0x55b08fc3d338, estate=0x55b08fc18c20, eflags=0) at execProcnode.c:177
#3 0x000055b07f0b26a9 in InitPlan (queryDesc=0x55b08fbe4990, eflags=0) at execMain.c:985
#4 0x000055b07f0b1435 in standard_ExecutorStart (queryDesc=0x55b08fbe4990, eflags=0) at execMain.c:259
#5 0x000055b07f0b1143 in ExecutorStart (queryDesc=0x55b08fbe4990, eflags=0) at execMain.c:135
#6 0x000055b07f395fab in ProcessQuery (plan=0x55b08fc3f0f8,
sourceText=0x55b08fb19420 "merge into part_abc_view pt\nusing (select stable_one() + 2 as pid) as q join part_abc_1 pt1 on (true)\non pt.a = stable_one() +2\nwhen not matched then insert values(1, 'd', false);", params=0x0, queryEnv=0x0,
dest=0x55b08fc3f278, qc=0x7ffc72bad230) at pquery.c:155
#7 0x000055b07f397af1 in PortalRunMulti (portal=0x55b08fb99c40, isTopLevel=true, setHoldSnapshot=false, dest=0x55b08fc3f278, altdest=0x55b08fc3f278, qc=0x7ffc72bad230) at pquery.c:1271
#8 0x000055b07f396ffd in PortalRun (portal=0x55b08fb99c40, count=9223372036854775807, isTopLevel=true, dest=0x55b08fc3f278, altdest=0x55b08fc3f278, qc=0x7ffc72bad230) at pquery.c:787
#9 0x000055b07f38fafd in exec_simple_query (
query_string=0x55b08fb19420 "merge into part_abc_view pt\nusing (select stable_one() + 2 as pid) as q join part_abc_1 pt1 on (true)\non pt.a = stable_one() +2\nwhen not matched then insert values(1, 'd', false);") at postgres.c:1271
#10 0x000055b07f394e65 in PostgresMain (dbname=0x55b08fb533e8 "postgres", username=0x55b08fb533c8 "ecs-user") at postgres.c:4691
#11 0x000055b07f38b71a in BackendMain (startup_data=0x7ffc72bad4b0 "", startup_data_len=4) at backend_startup.c:107
#12 0x000055b07f298873 in postmaster_child_launch (child_type=B_BACKEND, child_slot=1, startup_data=0x7ffc72bad4b0 "", startup_data_len=4, client_sock=0x7ffc72bad510) at launch_backend.c:274
#13 0x000055b07f29f1ce in BackendStartup (client_sock=0x7ffc72bad510) at postmaster.c:3519
#14 0x000055b07f29c7ec in ServerLoop () at postmaster.c:1688
#15 0x000055b07f29c0e4 in PostmasterMain (argc=3, argv=0x55b08fb13a30) at postmaster.c:1386
#16 0x000055b07f13fbd1 in main (argc=3, argv=0x55b08fb13a30) at main.c:230
But after you commit the 75dfde1, crash happened on ExecMergeNotMatched(),
econtext->ecxt_scantuple = NULL;
because econtext is null.
After 75dfde1, mergeActionLists is NIL, it directly returns in if (mergeActionLists == NIL).
I think the crash and the inconsistent result are all caused by cbc127917e.
Thanks,
Tender Wang
On Tue, 4 Mar 2025 at 09:30, David Rowley <dgrowleyml@gmail.com> wrote: > > It looks like this is happening because ExecInitModifyTable() skips > adding mergeActionsList items when bms_is_member(rti, > estate->es_unpruned_relids) is false for all resultRelations. This > results in an empty actions list. Because the MERGE is performing a > LEFT JOIN for the NOT MATCHED, ExecMerge() gets a row and runs > ExecMergeNotMatched(), which crashes on "econtext->ecxt_scantuple = > NULL;" because of a NULL econtext. econtext is NULL because > ExecInitMerge() skips calling ExecAssignExprContext() when > mergeActionLists is empty. > > There are a couple of ways I can see to fix this, 1) would be to move > the ExecAssignExprContext() above the "if (mergeActionLists == NIL)" > in ExecInitMerge(), or 2) add code to return NULL in > ExecMergeNotMatched() if actionStates is NULL. Hmm, I don't think that's right. I think this is just masking the problem. I think the real problem is that, as things stand, ExecInitModifyTable() must never be allowed to prune every leaf partition, because there are multiple places that rely on there being at least one resultRelInfo. For example, ExecModifyTable() passes the first resultRelInfo to ExecMerge() and ExecMergeNotMatched() in the NOT MATCHED case, and those use the action lists in that resultRelInfo to work out what action to execute. The fact that it didn't crash with this patch is probably just luck, because it's passing a pointer to uninitialised memory, but it's still doing the wrong thing by not invoking any merge actions. Similarly, if MERGE does an INSERT on a partitioned table, the code in ExecInitPartitionInfo() from ExecFindPartition() assumes that mtstate->resultRelInfo has at least one entry. So I think the simplest solution is to arrange for ExecInitModifyTable() to always include details of at least one result relation, adding at least one entry to each of the lists. As an alternative, it might be possible to make the executor cope with an empty resultRelInfo array (by using the root resultRelInfo instead, where one is needed), but I think that would be a bigger change. Regards, Dean
Dean Rasheed <dean.a.rasheed@gmail.com> 于2025年3月5日周三 02:56写道:
On Tue, 4 Mar 2025 at 09:30, David Rowley <dgrowleyml@gmail.com> wrote:
>
> It looks like this is happening because ExecInitModifyTable() skips
> adding mergeActionsList items when bms_is_member(rti,
> estate->es_unpruned_relids) is false for all resultRelations. This
> results in an empty actions list. Because the MERGE is performing a
> LEFT JOIN for the NOT MATCHED, ExecMerge() gets a row and runs
> ExecMergeNotMatched(), which crashes on "econtext->ecxt_scantuple =
> NULL;" because of a NULL econtext. econtext is NULL because
> ExecInitMerge() skips calling ExecAssignExprContext() when
> mergeActionLists is empty.
>
> There are a couple of ways I can see to fix this, 1) would be to move
> the ExecAssignExprContext() above the "if (mergeActionLists == NIL)"
> in ExecInitMerge(), or 2) add code to return NULL in
> ExecMergeNotMatched() if actionStates is NULL.
Hmm, I don't think that's right. I think this is just masking the problem.
I think the real problem is that, as things stand,
ExecInitModifyTable() must never be allowed to prune every leaf
partition, because there are multiple places that rely on there being
at least one resultRelInfo.
For example, ExecModifyTable() passes the first resultRelInfo to
ExecMerge() and ExecMergeNotMatched() in the NOT MATCHED case, and
those use the action lists in that resultRelInfo to work out what
action to execute. The fact that it didn't crash with this patch is
probably just luck, because it's passing a pointer to uninitialised
memory, but it's still doing the wrong thing by not invoking any merge
actions.
Yeah, I agree with you. As I said before, this fix will get a wrong result compared to
enable_partition_pruning = off, even though no crash happened again.
At least for NOT MATCH, we must have resultRelInfo entry.
Similarly, if MERGE does an INSERT on a partitioned table, the code in
ExecInitPartitionInfo() from ExecFindPartition() assumes that
mtstate->resultRelInfo has at least one entry.
So I think the simplest solution is to arrange for
ExecInitModifyTable() to always include details of at least one result
relation, adding at least one entry to each of the lists.
I have not tried this way. I'm not sure about this.
As an
alternative, it might be possible to make the executor cope with an
empty resultRelInfo array (by using the root resultRelInfo instead,
where one is needed), but I think that would be a bigger change.
Yeah, this way changes a lot of codes.
Thanks,
Tender Wang
On Wed, Mar 5, 2025 at 3:56 AM Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > On Tue, 4 Mar 2025 at 09:30, David Rowley <dgrowleyml@gmail.com> wrote: > > > > It looks like this is happening because ExecInitModifyTable() skips > > adding mergeActionsList items when bms_is_member(rti, > > estate->es_unpruned_relids) is false for all resultRelations. This > > results in an empty actions list. Because the MERGE is performing a > > LEFT JOIN for the NOT MATCHED, ExecMerge() gets a row and runs > > ExecMergeNotMatched(), which crashes on "econtext->ecxt_scantuple = > > NULL;" because of a NULL econtext. econtext is NULL because > > ExecInitMerge() skips calling ExecAssignExprContext() when > > mergeActionLists is empty. > > > > There are a couple of ways I can see to fix this, 1) would be to move > > the ExecAssignExprContext() above the "if (mergeActionLists == NIL)" > > in ExecInitMerge(), or 2) add code to return NULL in > > ExecMergeNotMatched() if actionStates is NULL. > > Hmm, I don't think that's right. I think this is just masking the problem. > > I think the real problem is that, as things stand, > ExecInitModifyTable() must never be allowed to prune every leaf > partition, because there are multiple places that rely on there being > at least one resultRelInfo. > > For example, ExecModifyTable() passes the first resultRelInfo to > ExecMerge() and ExecMergeNotMatched() in the NOT MATCHED case, and > those use the action lists in that resultRelInfo to work out what > action to execute. The fact that it didn't crash with this patch is > probably just luck, because it's passing a pointer to uninitialised > memory, but it's still doing the wrong thing by not invoking any merge > actions. > > Similarly, if MERGE does an INSERT on a partitioned table, the code in > ExecInitPartitionInfo() from ExecFindPartition() assumes that > mtstate->resultRelInfo has at least one entry. > > So I think the simplest solution is to arrange for > ExecInitModifyTable() to always include details of at least one result > relation, adding at least one entry to each of the lists. Thanks -- I’ve studied the code and I agree with the conclusion: when all result relations are pruned, we still need to lock and process the first one to preserve executor invariants. Your examples with ExecMerge() and ExecInitPartitionInfo() make the consequences of missing resultRelInfo[0] pretty clear. Locking and including the first result relation, even if pruned, seems like the most straightforward way to maintain correctness without deeper structural changes. I've come up with the attached. I'll still need to add a test case. -- Thanks, Amit Langote
Attachment
Amit Langote <amitlangote09@gmail.com> 于2025年3月12日周三 20:24写道:
Thanks -- I’ve studied the code and I agree with the conclusion: when
all result relations are pruned, we still need to lock and process the
first one to preserve executor invariants.
Your examples with ExecMerge() and ExecInitPartitionInfo() make the
consequences of missing resultRelInfo[0] pretty clear. Locking and
including the first result relation, even if pruned, seems like the
most straightforward way to maintain correctness without deeper
structural changes.
I've come up with the attached. I'll still need to add a test case.
It looks good to me, on a quick read-through.
We now don't have merge into ... not match ... test case in partition_prune.sql
In [1], I figured out a query that could trigger a crash. It may be used as the test case.
create table part_abc (a int, b text, c bool) partition by list (a);
create table part_abc_1 (b text, a int, c bool);
create table part_abc_2 (a int, c bool, b text);
alter table part_abc attach partition part_abc_1 for values in (1);
alter table part_abc attach partition part_abc_2 for values in (2);
insert into part_abc values (1, 'b', true);
insert into part_abc values (2, 'c', true);
create view part_abc_view as select * from part_abc where b <> 'a' with check option;
create function stable_one() returns int as $$ begin return 1; end; $$ language plpgsql stable;
create table part_abc_1 (b text, a int, c bool);
create table part_abc_2 (a int, c bool, b text);
alter table part_abc attach partition part_abc_1 for values in (1);
alter table part_abc attach partition part_abc_2 for values in (2);
insert into part_abc values (1, 'b', true);
insert into part_abc values (2, 'c', true);
create view part_abc_view as select * from part_abc where b <> 'a' with check option;
create function stable_one() returns int as $$ begin return 1; end; $$ language plpgsql stable;
Above SQLs are already in parition_prune.sql, only need the below SQLs.
explain (costs off)
merge into part_abc_view pt
using (select stable_one() + 2 as pid) as q join part_abc_1 pt1 on (true)
on pt.a = stable_one() +2
when not matched then insert values(1, 'd', false);
merge into part_abc_view pt
using (select stable_one() + 2 as pid) as q join part_abc_1 pt1 on (true)
on pt.a = stable_one() +2
when not matched then insert values(1, 'd', false);
explain (costs off)
merge into part_abc_view pt
using (select stable_one() + 2 as pid) as q join part_abc_1 pt1 on (true)
on pt.a = stable_one() +2
when not matched then insert values(1, 'd', false);
merge into part_abc_view pt
using (select stable_one() + 2 as pid) as q join part_abc_1 pt1 on (true)
on pt.a = stable_one() +2
when not matched then insert values(1, 'd', false);
Thanks,
Tender Wang
On Wed, 12 Mar 2025 at 12:24, Amit Langote <amitlangote09@gmail.com> wrote: > > Thanks -- I’ve studied the code and I agree with the conclusion: when > all result relations are pruned, we still need to lock and process the > first one to preserve executor invariants. > > Your examples with ExecMerge() and ExecInitPartitionInfo() make the > consequences of missing resultRelInfo[0] pretty clear. Locking and > including the first result relation, even if pruned, seems like the > most straightforward way to maintain correctness without deeper > structural changes. > > I've come up with the attached. I'll still need to add a test case. Hmm, this isn't quite sufficient. In ExecDoInitialPruning(), stmt->resultRelations contains all the result relations plus the root relation (if set) of each ModifyTable node in the planned stmt (see set_plan_refs()). Therefore, just adding the first result relation in that list may not be sufficient to ensure that every ModifyTable node ends up with at least one unpruned result relation. For example, consider: create table foo (a int); create table part_abc (a int, b text, c bool) partition by list (a); create table part_abc_1 (b text, c bool, a int); create table part_abc_2 (b text, a int, c bool); alter table part_abc attach partition part_abc_1 for values in (1); alter table part_abc attach partition part_abc_2 for values in (2); insert into part_abc_1 values ('b', true, 1); insert into part_abc_2 values ('c', 2, true); create function stable_one() returns int as $$ begin return 1; end; $$ language plpgsql stable; with t as ( merge into part_abc pt using (values(1)) v(a) on pt.a = stable_one() + 2 when not matched then insert values (v.a, 'd', false) returning pt.* ) insert into foo select a from t; ERROR: trying to open a pruned relation The problem here is that "foo" is the first result relation in stmt->resultRelations, so that's the one that it chooses not to prune, which is of no use to the merge. Regards, Dean
Dean Rasheed <dean.a.rasheed@gmail.com> 于2025年3月12日周三 22:59写道:
On Wed, 12 Mar 2025 at 12:24, Amit Langote <amitlangote09@gmail.com> wrote:
>
> Thanks -- I’ve studied the code and I agree with the conclusion: when
> all result relations are pruned, we still need to lock and process the
> first one to preserve executor invariants.
>
> Your examples with ExecMerge() and ExecInitPartitionInfo() make the
> consequences of missing resultRelInfo[0] pretty clear. Locking and
> including the first result relation, even if pruned, seems like the
> most straightforward way to maintain correctness without deeper
> structural changes.
>
> I've come up with the attached. I'll still need to add a test case.
Hmm, this isn't quite sufficient.
In ExecDoInitialPruning(), stmt->resultRelations contains all the
result relations plus the root relation (if set) of each ModifyTable
node in the planned stmt (see set_plan_refs()). Therefore, just adding
the first result relation in that list may not be sufficient to ensure
that every ModifyTable node ends up with at least one unpruned result
relation.
For example, consider:
create table foo (a int);
create table part_abc (a int, b text, c bool) partition by list (a);
create table part_abc_1 (b text, c bool, a int);
create table part_abc_2 (b text, a int, c bool);
alter table part_abc attach partition part_abc_1 for values in (1);
alter table part_abc attach partition part_abc_2 for values in (2);
insert into part_abc_1 values ('b', true, 1);
insert into part_abc_2 values ('c', 2, true);
create function stable_one() returns int as
$$ begin return 1; end; $$ language plpgsql stable;
with t as (
merge into part_abc pt using (values(1)) v(a) on pt.a = stable_one() + 2
when not matched then insert values (v.a, 'd', false)
returning pt.*
)
insert into foo select a from t;
ERROR: trying to open a pruned relation
Nice catch. The above case can be another test case.
I can find another same error in the below query:
with t as (update part_abc set c = true where a = stable_one() +2 returning a) insert into foo select a from t;
Thanks,
Tender Wang
On Wed, 12 Mar 2025 at 16:11, Tender Wang <tndrwang@gmail.com> wrote: > > I can find another same error in the below query: > > with t as (update part_abc set c = true where a = stable_one() +2 returning a) insert into foo select a from t; > Interesting example. That passes in HEAD, but fails with the v1 patch because it is attempting to open a pruned result relation when it doesn't need to. Attached is a rough patch that attempts to solve these issues locally in ExecInitModifyTable(), by not allowing all result relations to be pruned for that node. That makes it easy to handle cases with multiple ModifyTable nodes. This also has the advantage that all these relations can continue to be pruned from the subplans of the ModifyTable nodes, making those scans more efficient, and it only keeps a pruned result relation if all other result relations have been pruned. It does mean that ExecGetRangeTableRelation() needs to allow pruned relations to be opened, if called from ExecInitResultRelation(). I think that's OK because the check for opening pruned relations still applies to scan relations. I also adjusted explain.c slightly, to avoid the dummy pruned result relation that we might now keep from appearing in the EXPLAIN output as a target relation. Allowing it to be shown looked a bit odd, because it's not really the target relation in any real sense. I also noticed a few Asserts in ExecInitModifyTable() that didn't appear to be doing what they were originally intended to do. Regards, Dean
Attachment
Dean Rasheed <dean.a.rasheed@gmail.com> 于2025年3月13日周四 03:23写道:
On Wed, 12 Mar 2025 at 16:11, Tender Wang <tndrwang@gmail.com> wrote:
>
> I can find another same error in the below query:
>
> with t as (update part_abc set c = true where a = stable_one() +2 returning a) insert into foo select a from t;
>
Interesting example. That passes in HEAD, but fails with the v1 patch
because it is attempting to open a pruned result relation when it
doesn't need to.
Attached is a rough patch that attempts to solve these issues locally
in ExecInitModifyTable(), by not allowing all result relations to be
pruned for that node. That makes it easy to handle cases with multiple
ModifyTable nodes.
This also has the advantage that all these relations can continue to
be pruned from the subplans of the ModifyTable nodes, making those
scans more efficient, and it only keeps a pruned result relation if
all other result relations have been pruned.
It does mean that ExecGetRangeTableRelation() needs to allow pruned
relations to be opened, if called from ExecInitResultRelation(). I
think that's OK because the check for opening pruned relations still
applies to scan relations.
I also adjusted explain.c slightly, to avoid the dummy pruned result
relation that we might now keep from appearing in the EXPLAIN output
as a target relation. Allowing it to be shown looked a bit odd,
because it's not really the target relation in any real sense.
I also noticed a few Asserts in ExecInitModifyTable() that didn't
appear to be doing what they were originally intended to do.
I tested and debugged the attached patch. I didn't find any other issue.
A comment in execPartition.h may need to be changed:
* partrel Partitioned table Relation; obtained by
* ExecGetRangeTableRelation(estate, rti), where
* ExecGetRangeTableRelation(estate, rti), where
Because the func interface has changed, the above comments are better to be changed "ExecGetRangeTableRelation(estate, rti, false)".
Thanks,
Tender Wang
On Thu, Mar 13, 2025 at 4:23 AM Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > On Wed, 12 Mar 2025 at 16:11, Tender Wang <tndrwang@gmail.com> wrote: > > > > I can find another same error in the below query: > > > > with t as (update part_abc set c = true where a = stable_one() +2 returning a) insert into foo select a from t; > > Yes, I missed that PlannedStmt.resultRelations can include entries from multiple ModifyTable nodes -- not the first time I’ve overlooked that. :( > Interesting example. That passes in HEAD, but fails with the v1 patch > because it is attempting to open a pruned result relation when it > doesn't need to. > > Attached is a rough patch that attempts to solve these issues locally > in ExecInitModifyTable(), by not allowing all result relations to be > pruned for that node. That makes it easy to handle cases with multiple > ModifyTable nodes. Thanks for the patch. I have some comments on the approach. > This also has the advantage that all these relations can continue to > be pruned from the subplans of the ModifyTable nodes, making those > scans more efficient, and it only keeps a pruned result relation if > all other result relations have been pruned. > > It does mean that ExecGetRangeTableRelation() needs to allow pruned > relations to be opened, if called from ExecInitResultRelation(). I > think that's OK because the check for opening pruned relations still > applies to scan relations. + if (isResultRel && pruned) + { + /* + * A pruned result relation might not have been locked yet, so we + * must lock it now. + */ + rel = table_open(rte->relid, rte->rellockmode); + } One thing I'd like to avoid is taking any locks during ExecInitNode(), including in functions like ExecGetRangeTableRelation(). Doing so would require checking whether the CachedPlan is still valid and handling invalidation if it isn't -- see commit 525392d5727f for background. Handling that correctly would mean aborting plan initialization, cleaning up any partially initialized nodes, and returning early from ExecInitNode(). To avoid that complexity, I've arranged for all remaining necessary locks -- specifically on leaf partitions that survive pruning -- to be taken in ExecDoInitialPruning(), which is invoked by InitPlan() before any plan tree is initialized, and checking CachedPlan validity once after returning from it: /* ---------------------------------------------------------------- * InitPlan * * Initializes the query plan: open files, allocate storage * and start up the rule manager * * If the plan originates from a CachedPlan (given in queryDesc->cplan), * it can become invalid during runtime "initial" pruning when the * remaining set of locks is taken. The function returns early in that * case without initializing the plan, and the caller is expected to * retry with a new valid plan. * ---------------------------------------------------------------- */ static void InitPlan(QueryDesc *queryDesc, int eflags) { ... ExecDoInitialPruning(estate); if (!ExecPlanStillValid(estate)) return; So I'd prefer to keep all locking outside of ExecInitNode() for that reason. One way to deal with multiple ModifyTable nodes is to have PlannedStmt remember the "first" result relation of each, either as a List or a Bitmapset. But I’m not sure Tom would be on board with that, considering the work we did around 2018, e.g., commit 52ed730d511. Also, adding a new field to PlannedStmt just to support a corner case like this might be overkill -- though I’m fine with it if there’s no other non-invasive way to address the issue. The alternative you mentioned earlier -- modifying the code to use ModifyTableState.rootResultRelInfo instead of resultRelInfo[0] -- seems at least somewhat invasive. I’ve attached v3, which implements the above approach. It also includes your changes to explain.c, the ExecGetRangeTableRelation() interface adjustment, the Assert fixes in ExecInitModifyTable(), and the test cases. > I also adjusted explain.c slightly, to avoid the dummy pruned result > relation that we might now keep from appearing in the EXPLAIN output > as a target relation. Allowing it to be shown looked a bit odd, > because it's not really the target relation in any real sense. I agree about this part. > I also noticed a few Asserts in ExecInitModifyTable() that didn't > appear to be doing what they were originally intended to do. Check too. -- Thanks, Amit Langote
Attachment
On Thu, Mar 13, 2025 at 4:29 PM Tender Wang <tndrwang@gmail.com> wrote: > Dean Rasheed <dean.a.rasheed@gmail.com> 于2025年3月13日周四 03:23写道: >> It does mean that ExecGetRangeTableRelation() needs to allow pruned >> relations to be opened, if called from ExecInitResultRelation(). I >> think that's OK because the check for opening pruned relations still >> applies to scan relations. > > A comment in execPartition.h may need to be changed: > * partrel Partitioned table Relation; obtained by > * ExecGetRangeTableRelation(estate, rti), where > > Because the func interface has changed, the above comments are better to be changed "ExecGetRangeTableRelation(estate,rti, false)". Noted, thanks. -- Thanks, Amit Langote
On Thu, 13 Mar 2025 at 12:44, Amit Langote <amitlangote09@gmail.com> wrote: > > One thing I'd like to avoid is taking any locks during ExecInitNode(), > including in functions like ExecGetRangeTableRelation(). Doing so > would require checking whether the CachedPlan is still valid and > handling invalidation if it isn't -- see commit 525392d5727f for > background. > Yes, I wondered about that. I saw that earlier commit, and the notes in the executor README, but I still don't quite understand. Specifically, how is locking this pruned leaf partition in ExecInitModifyTable() different from locking any other pruned leaf partition later in the INSERT path? (For example, as part of an UPDATE that moves a tuple from an unpruned partition to a pruned one.) Regards, Dean
On Thu, Mar 13, 2025 at 11:42 PM Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > On Thu, 13 Mar 2025 at 12:44, Amit Langote <amitlangote09@gmail.com> wrote: > > > > One thing I'd like to avoid is taking any locks during ExecInitNode(), > > including in functions like ExecGetRangeTableRelation(). Doing so > > would require checking whether the CachedPlan is still valid and > > handling invalidation if it isn't -- see commit 525392d5727f for > > background. > > Yes, I wondered about that. I saw that earlier commit, and the notes > in the executor README, but I still don't quite understand. > Specifically, how is locking this pruned leaf partition in > ExecInitModifyTable() different from locking any other pruned leaf > partition later in the INSERT path? (For example, as part of an UPDATE > that moves a tuple from an unpruned partition to a pruned one.) I think the key distinction is that the result relation we’re locking was already present in the plan, and the executor code that uses resultRelInfo[0] doesn’t rely on traversing the plan tree -- it operates entirely on the state info built in ExecInitModifyTable(). So in this specific case, the usual plan invalidation hazard may not apply in the same way. I understand the comparison to locking pruned partitions during UPDATE execution, but I think that case is different: by then, the executor is fully initialized, and while invalidations can still happen, they’re assumed to be harmless for the remainder of execution. That’s why the system doesn’t check for plan validity or attempt to restart once execution has begun -- it’s expected to continue with the existing plan, under the assumption that any changes would only affect pruned partitions that are unreachable, especially via the scan path. Since those plan nodes remain unreferenced during tuple routing which only uses state initialized during execution, such changes don’t impact correctness. That said, my concern remains that taking any new locks during ExecInitNode() risks triggering CachedPlan validation logic that we’ve intentionally avoided dealing with, as per 525392d5727f. Even if this particular usage might not trip over it, it still feels like we’re stepping into a sensitive area. So I’d prefer to keep the locking in ExecDoInitialPruning(), where interacting with plan invalidation is expected and safe — and we don’t risk leaving the executor in a partially initialized state. -- Thanks, Amit Langote
On Fri, 14 Mar 2025 at 02:48, Amit Langote <amitlangote09@gmail.com> wrote: > > That said, my concern remains that taking any new locks during > ExecInitNode() risks triggering CachedPlan validation logic that we’ve > intentionally avoided dealing with, as per 525392d5727f. Even if this > particular usage might not trip over it, it still feels like we’re > stepping into a sensitive area. So I’d prefer to keep the locking in > ExecDoInitialPruning(), where interacting with plan invalidation is > expected and safe — and we don’t risk leaving the executor in a > partially initialized state. > OK, I think that makes sense. Trapping any plan invalidations early, after ExecDoInitialPruning(), and before initialising the rest of the executor state does seem preferable for that reason. I have a couple of comments on the v3 patch: * In ExecInitModifyTable(), it's possible to avoid code duplication. I think that's worth it to make the code more maintainable if more per-result relation logic is added in the future. * In executor.h, the name of the new function argument doesn't match the .c source file. * In ExecDoInitialPruning(), there is room for improvement: we actually only need to lock the first result relation if all other result relations of the ModifyTable node have been pruned. As it stands, there's no easy way to tell that, so I've just made a note of it in a comment. I wondered about replacing the new field "firstResultRels" with something like "mtResultRels", which would be a list of lists, to allow that, but I didn't try it. I'm attaching v4, which addresses those comments, and includes a few other comment update suggestions. Regards, Dean
Attachment
On Sun, Mar 16, 2025 at 6:57 PM Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > On Fri, 14 Mar 2025 at 02:48, Amit Langote <amitlangote09@gmail.com> wrote:> > > That said, my concern remains that taking any new locks during > > ExecInitNode() risks triggering CachedPlan validation logic that we’ve > > intentionally avoided dealing with, as per 525392d5727f. Even if this > > particular usage might not trip over it, it still feels like we’re > > stepping into a sensitive area. So I’d prefer to keep the locking in > > ExecDoInitialPruning(), where interacting with plan invalidation is > > expected and safe — and we don’t risk leaving the executor in a > > partially initialized state. > > > > OK, I think that makes sense. Trapping any plan invalidations early, > after ExecDoInitialPruning(), and before initialising the rest of the > executor state does seem preferable for that reason. > > I have a couple of comments on the v3 patch: > > * In ExecInitModifyTable(), it's possible to avoid code duplication. I > think that's worth it to make the code more maintainable if more > per-result relation logic is added in the future. > > * In executor.h, the name of the new function argument doesn't match > the .c source file. Thanks for making those changes. > * In ExecDoInitialPruning(), there is room for improvement: we > actually only need to lock the first result relation if all other > result relations of the ModifyTable node have been pruned. As it > stands, there's no easy way to tell that, so I've just made a note of > it in a comment. I wondered about replacing the new field > "firstResultRels" with something like "mtResultRels", which would be a > list of lists, to allow that, but I didn't try it. Or a List of Bitmapset, so that bms_intersect() can be used to check whether all result relations of a given ModifyTable have been pruned. But whether it’s that or a List of Lists, both would add a potentially large amount of memory to PlannedStmt, to avoid taking an extra lock. Probably not a great tradeoff, but we can revisit if it becomes worthwhile? > I'm attaching v4, which addresses those comments, and includes a few > other comment update suggestions. I've attached v5 with my commit message but were you planning to commit it yourself? If not, does the commit message look good to you? -- Thanks, Amit Langote
Attachment
On Mon, 17 Mar 2025 at 12:21, Amit Langote <amitlangote09@gmail.com> wrote: > > > I wondered about replacing the new field > > "firstResultRels" with something like "mtResultRels", which would be a > > list of lists, to allow that, but I didn't try it. > > Or a List of Bitmapset, so that bms_intersect() can be used to check > whether all result relations of a given ModifyTable have been pruned. > But whether it’s that or a List of Lists, both would add a potentially > large amount of memory to PlannedStmt, to avoid taking an extra lock. > Probably not a great tradeoff, but we can revisit if it becomes > worthwhile? Yeah. It would have been nice to fix this without adding a new field to PlannedStmt at all, but I couldn't see an easy way to do that. Actually I don't think a list of lists or Btimapsets would be that large. There are unlikely to be more than one or two ModifyTable nodes in most queries, and the number of bottom-level elements would be less than in the flat resultRelations list. But still, it doesn't really seem worth the effort. > I've attached v5 with my commit message but were you planning to > commit it yourself? I'll leave it to you, since it was your commit originally :-) > If not, does the commit message look good to you? It could perhaps be more specific about saying that ExecInitModifyTable() includes the first result relation if all others have been pruned, so it isn't always included. Regards, Dean
On Tue, Mar 18, 2025 at 8:36 AM Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > On Mon, 17 Mar 2025 at 12:21, Amit Langote <amitlangote09@gmail.com> wrote: > > > > > I wondered about replacing the new field > > > "firstResultRels" with something like "mtResultRels", which would be a > > > list of lists, to allow that, but I didn't try it. > > > > Or a List of Bitmapset, so that bms_intersect() can be used to check > > whether all result relations of a given ModifyTable have been pruned. > > But whether it’s that or a List of Lists, both would add a potentially > > large amount of memory to PlannedStmt, to avoid taking an extra lock. > > Probably not a great tradeoff, but we can revisit if it becomes > > worthwhile? > > Yeah. It would have been nice to fix this without adding a new field > to PlannedStmt at all, but I couldn't see an easy way to do that. > Actually I don't think a list of lists or Btimapsets would be that > large. There are unlikely to be more than one or two ModifyTable nodes > in most queries, and the number of bottom-level elements would be less > than in the flat resultRelations list. But still, it doesn't really > seem worth the effort. Ok, thanks -- appreciate your thoughts. I agree it would’ve been nice to avoid adding a new field to PlannedStmt, but I couldn’t see a simple way around it either, given the lock timing restrictions. > > I've attached v5 with my commit message but were you planning to > > commit it yourself? > > I'll leave it to you, since it was your commit originally :-) Yes, of course. :-) > > If not, does the commit message look good to you? > > It could perhaps be more specific about saying that > ExecInitModifyTable() includes the first result relation if all others > have been pruned, so it isn't always included. Ok, done in the attached. I'll push this tomorrow. -- Thanks, Amit Langote
Attachment
On Tue, Mar 18, 2025 at 4:52 PM Amit Langote <amitlangote09@gmail.com> wrote: > On Tue, Mar 18, 2025 at 8:36 AM Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > > On Mon, 17 Mar 2025 at 12:21, Amit Langote <amitlangote09@gmail.com> wrote: > > > If not, does the commit message look good to you? > > > > It could perhaps be more specific about saying that > > ExecInitModifyTable() includes the first result relation if all others > > have been pruned, so it isn't always included. > > Ok, done in the attached. > > I'll push this tomorrow. Pushed after a few more tweaks to the commit message for better logical flow, remove a redundant sentence. Thanks a lot all for your help on this. -- Thanks, Amit Langote