Thread: segmentation fault with simple UPDATE statement (postgres 10.5)
Hi!
We encountered a bug in our systems with update statement, but long story short, here's the self-containing test case which results in segmentation fault.
CREATE TABLE t1 (a VARCHAR(1));
CREATE TABLE t2 (b VARCHAR(1));
INSERT INTO t1 VALUES ('A');
INSERT INTO t2 VALUES ('A');
COMMIT;
UPDATE t1
SET (a) = (SELECT b FROM t2 WHERE t2.b = t1.a)
WHERE 'X' NOT IN ('Y', 'Z');
Running this (reliably) will result in:
2018-12-12 17:20:27.542 EET [7211] LOG: server process (PID 20620) was terminated by signal 11: Segmentation fault
2018-12-12 17:20:27.542 EET [7211] DETAIL: Failed process was running: UPDATE t1
SET (a) = (SELECT b FROM t2 WHERE t2.b = t1.a)
WHERE 'X' NOT IN ('Y', 'Z')
2018-12-12 17:20:27.542 EET [7211] LOG: terminating any other active server processes
2018-12-12 17:20:27.542 EET [20617] WARNING: terminating connection because of crash of another server process
2018-12-12 17:20:27.542 EET [20617] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2018-12-12 17:20:27.542 EET [20617] HINT: In a moment you should be able to reconnect to the database and repeat your command.
2018-12-12 17:20:27.546 EET [7211] LOG: all server processes terminated; reinitializing
2018-12-12 17:20:27.623 EET [20626] LOG: database system was interrupted; last known up at 2018-12-12 17:20:05 EET
OS logs show:
[2336281.740086] postgres[20477]: segfault at 8 ip 00005640b1892902 sp 00007ffdaa8427e0 error 4 in postgres[5640b17d5000+6da000]
It happened for us in ubuntu 16.04 with 4.19.2 kernel as well as CentOS 7 3.10.0 kernel, so I assume it's OS independent, although I can not verify it on windows.
Postgres version: postgresql-10, 10.5-2.pgdg16.04+1 on Ubuntu and postgresql10-server-10.5-1PGDG.rhel7 on CentOS.
Doing changes like these works fine:
changing SET clause "SET (a) = " to "SET a = " works fine (delete the brackets)
taking WHERE clause away works fine (delete "WHERE 'X' NOT IN ('Y', 'Z')")
rewriting SET value clause ("(SELECT b FROM t2 WHERE t2.b = t1.a)" to constant "(SELECT b FROM t2 WHERE t2.b = 'A')" works fine
-- regards Eduards Bezverhijs Technical PM & Lead Consultant / Architect @ Tieto Latvia
Hello I can reproduce on my local PostgreSQL 10.6 (Debian 10.6-1.pgdg80+1) But not on 11.1 GDB show backtrace: #0 slot_getattr (slot=0x0, attnum=1, isnull=0x5585781963f8 "") at /build/postgresql-10-svt3W6/postgresql-10-10.6/build/../src/backend/access/common/heaptuple.c:1143 #1 0x00005585774d72fc in ExecEvalExprSwitchContext (isNull=0x5585781963f8 "", econtext=0x5585781977b8, state=<optimized out>) at /build/postgresql-10-svt3W6/postgresql-10-10.6/build/../src/include/executor/executor.h:308 #2 ExecSetParamPlan (node=0x558578197a18, econtext=econtext@entry=0x5585781977b8) at /build/postgresql-10-svt3W6/postgresql-10-10.6/build/../src/backend/executor/nodeSubplan.c:988 #3 0x00005585774ab928 in ExecEvalParamExec (state=state@entry=0x5585781a96d0, op=op@entry=0x5585781a9798, econtext=econtext@entry=0x5585781977b8) at /build/postgresql-10-svt3W6/postgresql-10-10.6/build/../src/backend/executor/execExprInterp.c:1855 #4 0x00005585774ae596 in ExecInterpExpr (state=0x5585781a96d0, econtext=0x5585781977b8, isnull=<optimized out>) at /build/postgresql-10-svt3W6/postgresql-10-10.6/build/../src/backend/executor/execExprInterp.c:1013 #5 0x00005585774d3241 in ExecEvalExprSwitchContext (isNull=0x7ffe0ab21437 "", econtext=0x5585781977b8, state=0x5585781a96d0) at /build/postgresql-10-svt3W6/postgresql-10-10.6/build/../src/include/executor/executor.h:308 #6 ExecProject (projInfo=0x5585781a96c8) at /build/postgresql-10-svt3W6/postgresql-10-10.6/build/../src/include/executor/executor.h:342 #7 ExecResult (pstate=<optimized out>) at /build/postgresql-10-svt3W6/postgresql-10-10.6/build/../src/backend/executor/nodeResult.c:136 #8 0x00005585774cfcbd in ExecProcNode (node=0x558578197908) at /build/postgresql-10-svt3W6/postgresql-10-10.6/build/../src/include/executor/executor.h:250 #9 ExecModifyTable (pstate=0x558578197418) at /build/postgresql-10-svt3W6/postgresql-10-10.6/build/../src/backend/executor/nodeModifyTable.c:1643 #10 0x00005585774b1d23 in ExecProcNode (node=0x558578197418) at /build/postgresql-10-svt3W6/postgresql-10-10.6/build/../src/include/executor/executor.h:250 #11 ExecutePlan (execute_once=<optimized out>, dest=0x558578190158, direction=<optimized out>, numberTuples=0, sendTuples=<optimized out>, operation=CMD_UPDATE, use_parallel_mode=<optimized out>, planstate=0x558578197418, estate=0x5585781961d8) at /build/postgresql-10-svt3W6/postgresql-10-10.6/build/../src/backend/executor/execMain.c:1723 #12 standard_ExecutorRun (queryDesc=0x558578170728, direction=<optimized out>, count=0, execute_once=<optimized out>) at /build/postgresql-10-svt3W6/postgresql-10-10.6/build/../src/backend/executor/execMain.c:364 #13 0x00005585775ef735 in ProcessQuery (plan=<optimized out>, sourceText=0x55857814f438 "UPDATE t1\n SET (a) = (SELECT b FROM t2 WHERE t2.b = t1.a)\n WHERE 'X' NOT IN ('Y','Z');", params=0x0, queryEnv=0x0, dest=0x558578190158, completionTag=0x7ffe0ab218c0 "") at /build/postgresql-10-svt3W6/postgresql-10-10.6/build/../src/backend/tcop/pquery.c:161 #14 0x00005585775ef98b in PortalRunMulti (portal=portal@entry=0x558578180178, isTopLevel=isTopLevel@entry=1 '\001', setHoldSnapshot=setHoldSnapshot@entry=0 '\000', dest=dest@entry=0x558578190158, altdest=altdest@entry=0x558578190158, completionTag=completionTag@entry=0x7ffe0ab218c0 "") at /build/postgresql-10-svt3W6/postgresql-10-10.6/build/../src/backend/tcop/pquery.c:1286 #15 0x00005585775f0645 in PortalRun (portal=portal@entry=0x558578180178, count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=1 '\001', run_once=run_once@entry=1 '\001', dest=dest@entry=0x558578190158, altdest=altdest@entry=0x558578190158, completionTag=0x7ffe0ab218c0 "") at /build/postgresql-10-svt3W6/postgresql-10-10.6/build/../src/backend/tcop/pquery.c:799 #16 0x00005585775ec33c in exec_simple_query ( query_string=0x55857814f438 "UPDATE t1\n SET (a) = (SELECT b FROM t2 WHERE t2.b = t1.a)\n WHERE 'X' NOT IN ('Y','Z');") at /build/postgresql-10-svt3W6/postgresql-10-10.6/build/../src/backend/tcop/postgres.c:1122 #17 0x00005585775ee1cd in PostgresMain (argc=<optimized out>, argv=argv@entry=0x5585780fb900, dbname=0x5585780fb7f8 "postgres", username=<optimized out>) at /build/postgresql-10-svt3W6/postgresql-10-10.6/build/../src/backend/tcop/postgres.c:4117 #18 0x0000558577324500 in BackendRun (port=0x5585780f2060) at /build/postgresql-10-svt3W6/postgresql-10-10.6/build/../src/backend/postmaster/postmaster.c:4405 #19 BackendStartup (port=0x5585780f2060) at /build/postgresql-10-svt3W6/postgresql-10-10.6/build/../src/backend/postmaster/postmaster.c:4077 #20 ServerLoop () at /build/postgresql-10-svt3W6/postgresql-10-10.6/build/../src/backend/postmaster/postmaster.c:1755 #21 0x000055857757befd in PostmasterMain (argc=5, argv=<optimized out>) at /build/postgresql-10-svt3W6/postgresql-10-10.6/build/../src/backend/postmaster/postmaster.c:1363 #22 0x00005585773258eb in main (argc=5, argv=0x5585780b3760) at /build/postgresql-10-svt3W6/postgresql-10-10.6/build/../src/backend/main/ regards, Sergei
Bezverhijs Eduards <Eduards.Bezverhijs@tieto.com> writes: > We encountered a bug in our systems with update statement, but long story short, here's the self-containing test case whichresults in segmentation fault. Huh. I can reproduce this in 9.6 and 10, but not earlier or later branches. Looking ... regards, tom lane
I wrote: > Bezverhijs Eduards <Eduards.Bezverhijs@tieto.com> writes: >> We encountered a bug in our systems with update statement, but long story short, here's the self-containing test casewhich results in segmentation fault. > Huh. I can reproduce this in 9.6 and 10, but not earlier or later > branches. Looking ... Ah, I see the problem: v10 generates a plan like this: regression=# explain verbose UPDATE t1 SET (a) = (SELECT b FROM t2 WHERE t2.b = t1.a) WHERE current_user != 'x'; QUERY PLAN ------------------------------------------------------------------------------- Update on public.t1 (cost=0.01..100995.96 rows=2470 width=43) -> Result (cost=0.01..100995.96 rows=2470 width=43) Output: $1, ((SubPlan 1 (returns $1))), t1.ctid One-Time Filter: (CURRENT_USER <> 'x'::name) -> Seq Scan on public.t1 (cost=0.01..100995.96 rows=2470 width=43) Output: $1, (SubPlan 1 (returns $1)), t1.ctid SubPlan 1 (returns $1) -> Seq Scan on public.t2 (cost=0.00..40.88 rows=12 width=5) Output: t2.b Filter: ((t2.b)::text = (t1.a)::text) (10 rows) The implementation of multiassignments assumes that Params referencing the output of a multiassignment subplan will appear in the same targetlist as the SubPlan node for that multiassignment. Here, that's valid in the SeqScan's tlist, but the Result has another occurrence of the same Param, and that one is misplaced. That's because fix_upper_expr_mutator is doing things in the wrong order, causing it to emit a naked Param where it should emit a Var referencing the Param output from the lower plan node. I think it's just accidental that v11 and HEAD don't show the same problem. We've refactored where the main tlist evaluation happens: regression=# explain verbose UPDATE t1 SET (a) = (SELECT b FROM t2 WHERE t2.b = t1.a) WHERE current_user != 'x'; QUERY PLAN -------------------------------------------------------------------------- Update on public.t1 (cost=0.01..100995.96 rows=2470 width=43) -> Result (cost=0.01..100995.96 rows=2470 width=43) Output: $1, (SubPlan 1 (returns $1)), t1.ctid One-Time Filter: (CURRENT_USER <> 'x'::name) -> Seq Scan on public.t1 (cost=0.01..34.70 rows=2470 width=11) Output: t1.a, t1.ctid SubPlan 1 (returns $1) -> Seq Scan on public.t2 (cost=0.00..40.88 rows=12 width=5) Output: t2.b Filter: ((t2.b)::text = (t1.a)::text) (10 rows) but if there were somehow another plan node in between, it'd be just as broken. The same is true of 9.5. (Note: in these examples, I replaced your constant-true WHERE clause with "current_user != 'x'", because v11+ are smart enough to fold the NOT IN to constant true and then not generate a Result at all, obscuring the issue.) regards, tom lane
Hi, On 2018-12-12 11:13:17 -0500, Tom Lane wrote: > Bezverhijs Eduards <Eduards.Bezverhijs@tieto.com> writes: > > We encountered a bug in our systems with update statement, but long story short, here's the self-containing test casewhich results in segmentation fault. > > Huh. I can reproduce this in 9.6 and 10, but not earlier or later > branches. Looking ... Based on a relatively quick look it looks to me that crashing/no crashing is a question of plan shape, rather than a bugfix. The subplan is: {TARGETENTRY :expr {SUBPLAN :subLinkType 5 :testexpr <> :paramIds <> :plan_id 1 :plan_name SubPlan\ 1\ \(returns\ $1\) :firstColType 1043 :firstColTypmod 5 :firstColCollation 100 :useHashTable false :unknownEqFalse false :parallel_safe false :setParam (i 1) :parParam (i 0) :args ( {VAR :varno 1 :varattno 1 :vartype 1043 :vartypmod 5 :varcollid 100 :varlevelsup 0 :varnoold 1 :varoattno 1 :location 55 } ) :startup_cost 0.00 :per_call_cost 38.25 } :resno 2 :resname <> :ressortgroupref 0 :resorigtbl 0 :resorigcol 0 :resjunk true } with varno still referencing a specific varno, rather than INNER/OUTER. That works for master: ┌─────────────────────────────────────────────────────────────────────────┐ │ QUERY PLAN │ ├─────────────────────────────────────────────────────────────────────────┤ │ Update on public.t1 (cost=0.00..86477.60 rows=2260 width=46) │ │ -> Seq Scan on public.t1 (cost=0.00..86477.60 rows=2260 width=46) │ │ Output: $1, (SubPlan 1 (returns $1)), t1.ctid │ │ SubPlan 1 (returns $1) │ │ -> Seq Scan on public.t2 (cost=0.00..38.25 rows=11 width=8) │ │ Output: t2.b │ │ Filter: ((t2.b)::text = (t1.a)::text) │ └─────────────────────────────────────────────────────────────────────────┘ where the subplan is executed as a child of a seqscan, which thus has ecxt_scantuple set. But in 10: ┌───────────────────────────────────────────────────────────────────────────────┐ │ QUERY PLAN │ ├───────────────────────────────────────────────────────────────────────────────┤ │ Update on public.t1 (cost=0.00..86477.60 rows=2260 width=46) │ │ -> Result (cost=0.00..86477.60 rows=2260 width=46) │ │ Output: $1, ((SubPlan 1 (returns $1))), t1.ctid │ │ One-Time Filter: ('X'::text <> ALL ('{Y,Z}'::text[])) │ │ -> Seq Scan on public.t1 (cost=0.00..86477.60 rows=2260 width=46) │ │ Output: $1, (SubPlan 1 (returns $1)), t1.ctid │ │ SubPlan 1 (returns $1) │ │ -> Seq Scan on public.t2 (cost=0.00..38.25 rows=11 width=8) │ │ Output: t2.b │ │ Filter: ((t2.b)::text = (t1.a)::text) │ └───────────────────────────────────────────────────────────────────────────────┘ the subplan is executed below a Result node, which won't have scantuple set. Therefore we crash. Looking as to why that reference isn't corrected. Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > Looking as to why that reference isn't corrected. It's the bare Param that's the problem; the planner should've converted that to a Var referencing the Param evaluated in the child plan node. Working on patch now. regards, tom lane