Thread: array_agg(DISTINCT) caused a segmentation fault
Hi, In the current master branch, with enable_presorted_aggregate = on, I got a segmentation fault when executing the following query. OTOH, the query didn't cause a segmentation fault when enable_presorted_aggregate was disabled. =# SELECT array_agg(distinct val) FROM (SELECT NULL AS val FROM generate_series(1, 2)) hoge; LOG: server process (PID 76507) was terminated by signal 11: Segmentation fault: 11 DETAIL: Failed process was running: SELECT array_agg(distinct val) FROM (SELECT NULL AS val FROM generate_series(1, 2))hoge; The backtrace extracted from the core file is; * thread #1 * frame #0: 0x000000010815807d postgres`toast_raw_datum_size(value=0) at detoast.c:550:6 frame #1: 0x000000010891d31b postgres`texteq(fcinfo=0x00007ff7b7dc06b8) at varlena.c:1804:10 frame #2: 0x0000000108975607 postgres`FunctionCall2Coll(flinfo=0x00007fd46900e0d8, collation=100, arg1=0, arg2=0) atfmgr.c:1148:11 frame #3: 0x000000010846bee0 postgres`ExecEvalPreOrderedDistinctSingle(aggstate=0x00007fd46900c548, pertrans=0x00007fd46900dff0)at execExprInterp.c:4253:17 frame #4: 0x00000001084668a6 postgres`ExecInterpExpr(state=0x00007fd46908cab8, econtext=0x00007fd46900c970, isnull=0x00007ff7b7dc09d7)at execExprInterp.c:1772:8 frame #5: 0x000000010849804b postgres`ExecEvalExprSwitchContext(state=0x00007fd46908cab8, econtext=0x00007fd46900c970,isNull=0x00007ff7b7dc09d7) at executor.h:344:13 frame #6: 0x00000001084974ff postgres`advance_aggregates(aggstate=0x00007fd46900c548) at nodeAgg.c:823:2 frame #7: 0x0000000108496ff1 postgres`agg_retrieve_direct(aggstate=0x00007fd46900c548) at nodeAgg.c:2446:6 frame #8: 0x000000010849428b postgres`ExecAgg(pstate=0x00007fd46900c548) at nodeAgg.c:2171:14 frame #9: 0x0000000108480502 postgres`ExecProcNodeFirst(node=0x00007fd46900c548) at execProcnode.c:464:9 frame #10: 0x0000000108477f42 postgres`ExecProcNode(node=0x00007fd46900c548) at executor.h:262:9 frame #11: 0x0000000108473351 postgres`ExecutePlan(estate=0x00007fd46900c318, planstate=0x00007fd46900c548, use_parallel_mode=false,operation=CMD_SELECT, sendTuples=true, numberTuples=0, direction=ForwardScanDirection, dest=0x00007fd46908a4e0,execute_once=true) at execMain.c:1633:10 frame #12: 0x000000010847320b postgres`standard_ExecutorRun(queryDesc=0x00007fd469009318, direction=ForwardScanDirection,count=0, execute_once=true) at execMain.c:364:3 frame #13: 0x0000000108472fc2 postgres`ExecutorRun(queryDesc=0x00007fd469009318, direction=ForwardScanDirection, count=0,execute_once=true) at execMain.c:308:3 frame #14: 0x0000000108752794 postgres`PortalRunSelect(portal=0x00007fd469031718, forward=true, count=0, dest=0x00007fd46908a4e0)at pquery.c:924:4 frame #15: 0x0000000108752179 postgres`PortalRun(portal=0x00007fd469031718, count=9223372036854775807, isTopLevel=true,run_once=true, dest=0x00007fd46908a4e0, altdest=0x00007fd46908a4e0, qc=0x00007ff7b7dc0df0) at pquery.c:768:18 frame #16: 0x000000010874d5a2 postgres`exec_simple_query(query_string="SELECT array_agg(distinct val) FROM (SELECT NULLAS val FROM generate_series(1, 2)) hoge;") at postgres.c:1237:10 frame #17: 0x000000010874c6de postgres`PostgresMain(dbname="postgres", username="postgres") at postgres.c:4565:7 frame #18: 0x000000010865c7c2 postgres`BackendRun(port=0x00007fd468404080) at postmaster.c:4461:2 frame #19: 0x000000010865a09c postgres`BackendStartup(port=0x00007fd468404080) at postmaster.c:4189:3 frame #20: 0x0000000108657a7e postgres`ServerLoop at postmaster.c:1779:6 frame #21: 0x00000001086566d0 postgres`PostmasterMain(argc=3, argv=0x0000600001635260) at postmaster.c:1463:11 frame #22: 0x0000000108506b27 postgres`main(argc=3, argv=0x0000600001635260) at main.c:200:3 frame #23: 0x000000011202552e dyld`start + 462 Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On Mon, 13 Feb 2023 at 18:29, Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > =# SELECT array_agg(distinct val) FROM (SELECT NULL AS val FROM generate_series(1, 2)) hoge; > > LOG: server process (PID 76507) was terminated by signal 11: Segmentation fault: 11 Thanks for the report. Looks like mine as there's no crash with: set enable_presorted_aggregate=0; David
On Mon, 13 Feb 2023 at 18:29, Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > =# SELECT array_agg(distinct val) FROM (SELECT NULL AS val FROM generate_series(1, 2)) hoge; > LOG: server process (PID 76507) was terminated by signal 11: Segmentation fault: 11 > DETAIL: Failed process was running: SELECT array_agg(distinct val) FROM (SELECT NULL AS val FROM generate_series(1, 2))hoge; This was a fairly trivial logic bug in ExecEvalPreOrderedDistinctSingle(). The JIT code calls that same function and ExecEvalPreOrderedDistinctMulti() uses the standard expression evaluation logic. So looks like the problem is just isolated to ExecEvalPreOrderedDistinctSingle(). I've now pushed a fix for it and included your test. To get it to crash it needed to be a byref aggregate without a strict transition function. There are not too many of those, which is probably why nobody noticed this before. David
On 2023/02/13 16:44, David Rowley wrote: > On Mon, 13 Feb 2023 at 18:29, Fujii Masao <masao.fujii@oss.nttdata.com> wrote: >> =# SELECT array_agg(distinct val) FROM (SELECT NULL AS val FROM generate_series(1, 2)) hoge; >> LOG: server process (PID 76507) was terminated by signal 11: Segmentation fault: 11 >> DETAIL: Failed process was running: SELECT array_agg(distinct val) FROM (SELECT NULL AS val FROM generate_series(1, 2))hoge; > > This was a fairly trivial logic bug in > ExecEvalPreOrderedDistinctSingle(). The JIT code calls that same > function and ExecEvalPreOrderedDistinctMulti() uses the standard > expression evaluation logic. So looks like the problem is just > isolated to ExecEvalPreOrderedDistinctSingle(). > > I've now pushed a fix for it and included your test. To get it to > crash it needed to be a byref aggregate without a strict transition > function. There are not too many of those, which is probably why > nobody noticed this before. Thanks for the fix! Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Hello David,
13.02.2023 10:44, David Rowley wrote:
13.02.2023 10:44, David Rowley wrote:
I've encountered an issue that could have the same title but it still reproduced after the fix.On Mon, 13 Feb 2023 at 18:29, Fujii Masao <masao.fujii@oss.nttdata.com> wrote:=# SELECT array_agg(distinct val) FROM (SELECT NULL AS val FROM generate_series(1, 2)) hoge; LOG: server process (PID 76507) was terminated by signal 11: Segmentation fault: 11 DETAIL: Failed process was running: SELECT array_agg(distinct val) FROM (SELECT NULL AS val FROM generate_series(1, 2)) hoge;I've now pushed a fix for it and included your test. To get it to crash it needed to be a byref aggregate without a strict transition function. There are not too many of those, which is probably why nobody noticed this before.
The following query:
SELECT array_agg(DISTINCT a ORDER BY a DESC)
FROM (VALUES (1),(1.0),(NULL)) v(a);
Invokes valgrind-detected error:
==00:00:00:03.708 2686358== Invalid read of size 4
==00:00:00:03.708 2686358== at 0x76C4AE: GetMemoryChunkMethodID (mcxt.c:195)
==00:00:00:03.708 2686358== by 0x76C4AE: pfree (mcxt.c:1439)
==00:00:00:03.708 2686358== by 0x3FD547: ExecEvalPreOrderedDistinctSingle (execExprInterp.c:4258)
==00:00:00:03.708 2686358== by 0x3FF203: ExecInterpExpr (execExprInterp.c:1772)
==00:00:00:03.708 2686358== by 0x418792: ExecEvalExprSwitchContext (executor.h:344)
==00:00:00:03.708 2686358== by 0x418792: advance_aggregates (nodeAgg.c:823)
==00:00:00:03.708 2686358== by 0x41A12A: agg_retrieve_direct (nodeAgg.c:2446)
==00:00:00:03.708 2686358== by 0x41A294: ExecAgg (nodeAgg.c:2171)
==00:00:00:03.708 2686358== by 0x40AD3F: ExecProcNodeFirst (execProcnode.c:464)
==00:00:00:03.708 2686358== by 0x40337F: ExecProcNode (executor.h:262)
==00:00:00:03.708 2686358== by 0x40337F: ExecutePlan (execMain.c:1633)
==00:00:00:03.708 2686358== by 0x403542: standard_ExecutorRun (execMain.c:364)
==00:00:00:03.708 2686358== by 0x40360E: ExecutorRun (execMain.c:308)
==00:00:00:03.708 2686358== by 0x5EB971: PortalRunSelect (pquery.c:924)
==00:00:00:03.708 2686358== by 0x5ED31B: PortalRun (pquery.c:768)
==00:00:00:03.708 2686358== Address 0xfffffffffffffff8 is not stack'd, malloc'd or (recently) free'd
==00:00:00:03.708 2686358==
...
==00:00:00:03.708 2686358==
==00:00:00:03.708 2686358== Exit program on first error (--exit-on-first-error=yes)
2023-02-13 10:26:39.276 MSK [2686332] LOG: server process (PID 2686358) exited with exit code 1
2023-02-13 10:26:39.276 MSK [2686332] DETAIL: Failed process was running: SELECT array_agg(DISTINCT a ORDER BY a DESC)
FROM (VALUES (1),(1.0),(NULL)) v(a);
(Without valgrind I get SIGSEGV here.)
The first bad commit is 1349d2790 again (but before 80ef92675 an assertion failure can be seen).
Best regards,
Alexander
On Mon, 13 Feb 2023 at 23:00, Alexander Lakhin <exclusion@gmail.com> wrote: > I've encountered an issue that could have the same title but it still reproduced after the fix. > The following query: > SELECT array_agg(DISTINCT a ORDER BY a DESC) > FROM (VALUES (1),(1.0),(NULL)) v(a); Thanks for testing that. I neglected to update the logic which pfrees the old Datum, which (as of 7da51590e) may now be NULL. I've just pushed a fix. David
13.02.2023 13:41, David Rowley wrote: > On Mon, 13 Feb 2023 at 23:00, Alexander Lakhin <exclusion@gmail.com> wrote: > ... > Thanks for testing that. I neglected to update the logic which pfrees > the old Datum, which (as of 7da51590e) may now be NULL. > > I've just pushed a fix. Thanks! The issue is not reproduced now. Best regards, Alexander