Thread: v16dev: invalid memory alloc request size 8488348128
I hit this elog() while testing reports under v16 and changed to PANIC to help diagnose. DETAILS: PANIC: invalid memory alloc request size 18446744072967930808 CONTEXT: PL/pgSQL function array_weight(real[],real[]) while storing call arguments into local variables I can't share the query, data, nor plpgsql functions themselves. I reproduced the problem at this commit, but not at its parent. commit 42b746d4c982257bf3f924176632b04dc288174b (HEAD) Author: Tom Lane <tgl@sss.pgh.pa.us> Date: Thu Oct 6 13:27:34 2022 -0400 Remove uses of MemoryContextContains in nodeAgg.c and nodeWindowAgg.c. #2 0x0000000001067af5 in errfinish (filename=filename@entry=0x168f1e0 "../src/backend/utils/mmgr/mcxt.c", lineno=lineno@entry=1013, funcname=funcname@entry=0x16901a0 <__func__.17850> "MemoryContextAlloc") at ../src/backend/utils/error/elog.c:604 #3 0x00000000010c57c7 in MemoryContextAlloc (context=context@entry=0x604200032600, size=size@entry=8488348128) at ../src/backend/utils/mmgr/mcxt.c:1013 #4 0x0000000000db49a4 in copy_byval_expanded_array (eah=eah@entry=0x604200032718, oldeah=0x604200032718) at ../src/backend/utils/adt/array_expanded.c:195 #5 0x0000000000db5f7a in expand_array (arraydatum=105836584314672, parentcontext=<optimized out>, metacache=0x7ffcbd2d29c0,metacache@entry=0x0) at ../src/backend/utils/adt/array_expanded.c:104 #6 0x00007f6c05a6b4d0 in plpgsql_exec_function (func=func@entry=0x6092004a4c58, fcinfo=fcinfo@entry=0x7f6c04f7efc8, simple_eval_estate=simple_eval_estate@entry=0x0, simple_eval_resowner=simple_eval_resowner@entry=0x0, procedure_resowner=procedure_resowner@entry=0x0, atomic=atomic@entry=true) at ../src/pl/plpgsql/src/pl_exec.c:556 #7 0x00007f6c05a76af4 in plpgsql_call_handler (fcinfo=<optimized out>) at ../src/pl/plpgsql/src/pl_handler.c:277 #8 0x00000000008b30cd in ExecInterpExpr (state=0x7f6c04fd6750, econtext=0x6072000712d0, isnull=0x7ffcbd2d2fa0) at ../src/backend/executor/execExprInterp.c:733 #9 0x00000000008a6c5f in ExecInterpExprStillValid (state=0x7f6c04fd6750, econtext=0x6072000712d0, isNull=0x7ffcbd2d2fa0) at ../src/backend/executor/execExprInterp.c:1858 #10 0x000000000090032b in ExecEvalExprSwitchContext (isNull=0x7ffcbd2d2fa0, econtext=0x6072000712d0, state=0x7f6c04fd6750)at ../src/include/executor/executor.h:354 #11 ExecProject (projInfo=0x7f6c04fd6748) at ../src/include/executor/executor.h:388 #12 project_aggregates (aggstate=aggstate@entry=0x607200070d38) at ../src/backend/executor/nodeAgg.c:1377 #13 0x0000000000903eb6 in agg_retrieve_direct (aggstate=aggstate@entry=0x607200070d38) at ../src/backend/executor/nodeAgg.c:2520 #14 0x0000000000904074 in ExecAgg (pstate=0x607200070d38) at ../src/backend/executor/nodeAgg.c:2172 #15 0x00000000008d90e0 in ExecProcNodeFirst (node=0x607200070d38) at ../src/backend/executor/execProcnode.c:464 #16 0x00000000008c1e5f in ExecProcNode (node=0x607200070d38) at ../src/include/executor/executor.h:272 #17 ExecutePlan (estate=estate@entry=0x607200070a18, planstate=0x607200070d38, use_parallel_mode=false, operation=operation@entry=CMD_SELECT,sendTuples=true, numberTuples=numberTuples@entry=0, direction=direction@entry=ForwardScanDirection, dest=dest@entry=0x7f6c051abd28, execute_once=execute_once@entry=true) at ../src/backend/executor/execMain.c:1640 #18 0x00000000008c3ffb in standard_ExecutorRun (queryDesc=0x604200016998, direction=ForwardScanDirection, count=0, execute_once=<optimizedout>) at ../src/backend/executor/execMain.c:365 #19 0x00000000008c4125 in ExecutorRun (queryDesc=queryDesc@entry=0x604200016998, direction=direction@entry=ForwardScanDirection,count=count@entry=0, execute_once=<optimized out>) at ../src/backend/executor/execMain.c:309 #20 0x0000000000d5d148 in PortalRunSelect (portal=portal@entry=0x607200028a18, forward=forward@entry=true, count=0, count@entry=9223372036854775807, dest=dest@entry=0x7f6c051abd28) at ../src/backend/tcop/pquery.c:924 #21 0x0000000000d60dc8 in PortalRun (portal=portal@entry=0x607200028a18, count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=true, run_once=run_once@entry=true, dest=dest@entry=0x7f6c051abd28, altdest=altdest@entry=0x7f6c051abd28, qc=<optimized out>,qc@entry=0x7ffcbd2d3580) at ../src/backend/tcop/pquery.c:768 #22 0x0000000000d595fd in exec_simple_query ( query_string=query_string@entry=0x6082000cf238 "... #23 0x0000000000d5c72c in PostgresMain (dbname=dbname@entry=0x60820000b378 "postgres", username=username@entry=0x60820000b358"telsasoft") at ../src/backend/tcop/postgres.c:4632 #24 0x0000000000bddc19 in BackendRun (port=port@entry=0x60300000fc40) at ../src/backend/postmaster/postmaster.c:4461 #25 0x0000000000be2583 in BackendStartup (port=port@entry=0x60300000fc40) at ../src/backend/postmaster/postmaster.c:4189 #26 0x0000000000be2a05 in ServerLoop () at ../src/backend/postmaster/postmaster.c:1779 #27 0x0000000000be436b in PostmasterMain (argc=argc@entry=9, argv=argv@entry=0x600e0000df40) at ../src/backend/postmaster/postmaster.c:1463 #28 0x00000000009c33d5 in main (argc=9, argv=0x600e0000df40) at ../src/backend/main/main.c:200 (gdb) fr 4 #4 0x0000000000db49a4 in copy_byval_expanded_array (eah=eah@entry=0x604200032718, oldeah=0x604200032718) at ../src/backend/utils/adt/array_expanded.c:195 195 eah->dims = (int *) MemoryContextAlloc(objcxt, ndims * 2 * sizeof(int)); (gdb) p ndims $1 = 1061043516 -- Justin
On Sat, 15 Apr 2023 at 08:36, Justin Pryzby <pryzby@telsasoft.com> wrote: > > I hit this elog() while testing reports under v16 and changed to PANIC > to help diagnose. > > DETAILS: PANIC: invalid memory alloc request size 18446744072967930808 > CONTEXT: PL/pgSQL function array_weight(real[],real[]) while storing call arguments into local variables > > I can't share the query, data, nor plpgsql functions themselves. Which aggregate function is being called here? Is it a custom aggregate written in C, by any chance? David
On Sat, Apr 15, 2023 at 10:04:52AM +1200, David Rowley wrote: > On Sat, 15 Apr 2023 at 08:36, Justin Pryzby <pryzby@telsasoft.com> wrote: > > > > I hit this elog() while testing reports under v16 and changed to PANIC > > to help diagnose. > > > > DETAILS: PANIC: invalid memory alloc request size 18446744072967930808 > > CONTEXT: PL/pgSQL function array_weight(real[],real[]) while storing call arguments into local variables > > > > I can't share the query, data, nor plpgsql functions themselves. > > Which aggregate function is being called here? Is it a custom > aggregate written in C, by any chance? That function is not an aggregate: ts=# \sf array_weight CREATE OR REPLACE FUNCTION public.array_weight(real[], real[]) RETURNS real LANGUAGE plpgsql IMMUTABLE PARALLEL SAFE And we don't have any C code loaded to postgres. We do have polymorphic aggregate functions using anycompatiblearray [*], and array_weight is being called several times with those aggregates as its arguments. *As in: 9e38c2bb5093ceb0c04d6315ccd8975bd17add66 97f73a978fc1aca59c6ad765548ce0096d95a923 09878cdd489ff7aca761998e7cb104f4fd98ae02
On Sat, 15 Apr 2023 at 10:48, Justin Pryzby <pryzby@telsasoft.com> wrote: > > On Sat, Apr 15, 2023 at 10:04:52AM +1200, David Rowley wrote: > > Which aggregate function is being called here? Is it a custom > > aggregate written in C, by any chance? > > That function is not an aggregate: There's an aggregate somewhere as indicated by this fragment from the stack trace: > #12 project_aggregates (aggstate=aggstate@entry=0x607200070d38) at ../src/backend/executor/nodeAgg.c:1377 > #13 0x0000000000903eb6 in agg_retrieve_direct (aggstate=aggstate@entry=0x607200070d38) at ../src/backend/executor/nodeAgg.c:2520 > #14 0x0000000000904074 in ExecAgg (pstate=0x607200070d38) at ../src/backend/executor/nodeAgg.c:2172 Any chance you could try and come up with a minimal reproducer? You have access to see which aggregates are being used here and what data types are being given to them and then what's being done with the return value of that aggregate that's causing the crash. Maybe you can still get the crash if you mock up some data to aggregate and strip out the guts from the plpgsql functions that we're crashing on? David
David Rowley <dgrowleyml@gmail.com> writes: > Any chance you could try and come up with a minimal reproducer? Yeah --- there's an awful lot of moving parts there, and a stack trace is not much to go on. regards, tom lane
Maybe you'll find valgrind errors to be helpful. ==17971== Source and destination overlap in memcpy(0x1eb8c078, 0x1d88cb20, 123876054) ==17971== at 0x4C2E81D: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035) ==17971== by 0x9C705A: memcpy (string3.h:51) ==17971== by 0x9C705A: pg_detoast_datum_copy (fmgr.c:1823) ==17971== by 0x8952F8: expand_array (array_expanded.c:131) ==17971== by 0x1E971A28: plpgsql_exec_function (pl_exec.c:556) ==17971== by 0x1E97CF83: plpgsql_call_handler (pl_handler.c:277) ==17971== by 0x6BFA4E: ExecInterpExpr (execExprInterp.c:733) ==17971== by 0x6D9C8C: ExecEvalExprSwitchContext (executor.h:354) ==17971== by 0x6D9C8C: ExecProject (executor.h:388) ==17971== by 0x6D9C8C: project_aggregates (nodeAgg.c:1377) ==17971== by 0x6DB2B4: agg_retrieve_direct (nodeAgg.c:2520) ==17971== by 0x6DB2B4: ExecAgg (nodeAgg.c:2172) ==17971== by 0x6C4821: ExecProcNode (executor.h:272) ==17971== by 0x6C4821: ExecutePlan (execMain.c:1640) ==17971== by 0x6C4821: standard_ExecutorRun (execMain.c:365) ==17971== by 0x870535: PortalRunSelect (pquery.c:924) ==17971== by 0x871CCE: PortalRun (pquery.c:768) ==17971== by 0x86D552: exec_simple_query (postgres.c:1274) ==17971== Invalid read of size 8 ==17971== at 0x4C2EA20: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035) ==17971== by 0x9C705A: memcpy (string3.h:51) ==17971== by 0x9C705A: pg_detoast_datum_copy (fmgr.c:1823) ==17971== by 0x8952F8: expand_array (array_expanded.c:131) ==17971== by 0x1E971A28: plpgsql_exec_function (pl_exec.c:556) ==17971== by 0x1E97CF83: plpgsql_call_handler (pl_handler.c:277) ==17971== by 0x6BFA4E: ExecInterpExpr (execExprInterp.c:733) ==17971== by 0x6D9C8C: ExecEvalExprSwitchContext (executor.h:354) ==17971== by 0x6D9C8C: ExecProject (executor.h:388) ==17971== by 0x6D9C8C: project_aggregates (nodeAgg.c:1377) ==17971== by 0x6DB2B4: agg_retrieve_direct (nodeAgg.c:2520) ==17971== by 0x6DB2B4: ExecAgg (nodeAgg.c:2172) ==17971== by 0x6C4821: ExecProcNode (executor.h:272) ==17971== by 0x6C4821: ExecutePlan (execMain.c:1640) ==17971== by 0x6C4821: standard_ExecutorRun (execMain.c:365) ==17971== by 0x870535: PortalRunSelect (pquery.c:924) ==17971== by 0x871CCE: PortalRun (pquery.c:768) ==17971== by 0x86D552: exec_simple_query (postgres.c:1274) ==17971== Address 0x1eb8c038 is 8 bytes before a block of size 123,876,112 alloc'd ==17971== at 0x4C29F73: malloc (vg_replace_malloc.c:309) ==17971== by 0x9E4204: AllocSetAlloc (aset.c:732) ==17971== by 0x9ED5BD: palloc (mcxt.c:1224) ==17971== by 0x9C704C: pg_detoast_datum_copy (fmgr.c:1821) ==17971== by 0x8952F8: expand_array (array_expanded.c:131) ==17971== by 0x1E971A28: plpgsql_exec_function (pl_exec.c:556) ==17971== by 0x1E97CF83: plpgsql_call_handler (pl_handler.c:277) ==17971== by 0x6BFA4E: ExecInterpExpr (execExprInterp.c:733) ==17971== by 0x6D9C8C: ExecEvalExprSwitchContext (executor.h:354) ==17971== by 0x6D9C8C: ExecProject (executor.h:388) ==17971== by 0x6D9C8C: project_aggregates (nodeAgg.c:1377) ==17971== by 0x6DB2B4: agg_retrieve_direct (nodeAgg.c:2520) ==17971== by 0x6DB2B4: ExecAgg (nodeAgg.c:2172) ==17971== by 0x6C4821: ExecProcNode (executor.h:272) ==17971== by 0x6C4821: ExecutePlan (execMain.c:1640) ==17971== by 0x6C4821: standard_ExecutorRun (execMain.c:365) ==17971== by 0x870535: PortalRunSelect (pquery.c:924) ==17971== Invalid read of size 8 ==17971== at 0x4C2EA28: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035) ==17971== by 0x9C705A: memcpy (string3.h:51) ==17971== by 0x9C705A: pg_detoast_datum_copy (fmgr.c:1823) ==17971== by 0x8952F8: expand_array (array_expanded.c:131) ==17971== by 0x1E971A28: plpgsql_exec_function (pl_exec.c:556) ==17971== by 0x1E97CF83: plpgsql_call_handler (pl_handler.c:277) ==17971== by 0x6BFA4E: ExecInterpExpr (execExprInterp.c:733) ==17971== by 0x6D9C8C: ExecEvalExprSwitchContext (executor.h:354) ==17971== by 0x6D9C8C: ExecProject (executor.h:388) ==17971== by 0x6D9C8C: project_aggregates (nodeAgg.c:1377) ==17971== by 0x6DB2B4: agg_retrieve_direct (nodeAgg.c:2520) ==17971== by 0x6DB2B4: ExecAgg (nodeAgg.c:2172) ==17971== by 0x6C4821: ExecProcNode (executor.h:272) ==17971== by 0x6C4821: ExecutePlan (execMain.c:1640) ==17971== by 0x6C4821: standard_ExecutorRun (execMain.c:365) ==17971== by 0x870535: PortalRunSelect (pquery.c:924) ==17971== by 0x871CCE: PortalRun (pquery.c:768) ==17971== by 0x86D552: exec_simple_query (postgres.c:1274) ==17971== Address 0x1eb8c030 is 16 bytes before a block of size 123,876,112 alloc'd ==17971== at 0x4C29F73: malloc (vg_replace_malloc.c:309) ==17971== by 0x9E4204: AllocSetAlloc (aset.c:732) ==17971== by 0x9ED5BD: palloc (mcxt.c:1224) ==17971== by 0x9C704C: pg_detoast_datum_copy (fmgr.c:1821) ==17971== by 0x8952F8: expand_array (array_expanded.c:131) ==17971== by 0x1E971A28: plpgsql_exec_function (pl_exec.c:556) ==17971== by 0x1E97CF83: plpgsql_call_handler (pl_handler.c:277) ==17971== by 0x6BFA4E: ExecInterpExpr (execExprInterp.c:733) ==17971== by 0x6D9C8C: ExecEvalExprSwitchContext (executor.h:354) ==17971== by 0x6D9C8C: ExecProject (executor.h:388) ==17971== by 0x6D9C8C: project_aggregates (nodeAgg.c:1377) ==17971== by 0x6DB2B4: agg_retrieve_direct (nodeAgg.c:2520) ==17971== by 0x6DB2B4: ExecAgg (nodeAgg.c:2172) ==17971== by 0x6C4821: ExecProcNode (executor.h:272) ==17971== by 0x6C4821: ExecutePlan (execMain.c:1640) ==17971== by 0x6C4821: standard_ExecutorRun (execMain.c:365) ==17971== by 0x870535: PortalRunSelect (pquery.c:924) ==17971== Invalid read of size 8 ==17971== at 0x4C2EA0C: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035) ==17971== by 0x9C705A: memcpy (string3.h:51) ==17971== by 0x9C705A: pg_detoast_datum_copy (fmgr.c:1823) ==17971== by 0x8952F8: expand_array (array_expanded.c:131) ==17971== by 0x1E971A28: plpgsql_exec_function (pl_exec.c:556) ==17971== by 0x1E97CF83: plpgsql_call_handler (pl_handler.c:277) ==17971== by 0x6BFA4E: ExecInterpExpr (execExprInterp.c:733) ==17971== by 0x6D9C8C: ExecEvalExprSwitchContext (executor.h:354) ==17971== by 0x6D9C8C: ExecProject (executor.h:388) ==17971== by 0x6D9C8C: project_aggregates (nodeAgg.c:1377) ==17971== by 0x6DB2B4: agg_retrieve_direct (nodeAgg.c:2520) ==17971== by 0x6DB2B4: ExecAgg (nodeAgg.c:2172) ==17971== by 0x6C4821: ExecProcNode (executor.h:272) ==17971== by 0x6C4821: ExecutePlan (execMain.c:1640) ==17971== by 0x6C4821: standard_ExecutorRun (execMain.c:365) ==17971== by 0x870535: PortalRunSelect (pquery.c:924) ==17971== by 0x871CCE: PortalRun (pquery.c:768) ==17971== by 0x86D552: exec_simple_query (postgres.c:1274) ==17971== Address 0x1eb8c028 is 24 bytes before a block of size 123,876,112 alloc'd ==17971== at 0x4C29F73: malloc (vg_replace_malloc.c:309) ==17971== by 0x9E4204: AllocSetAlloc (aset.c:732) ==17971== by 0x9ED5BD: palloc (mcxt.c:1224) ==17971== by 0x9C704C: pg_detoast_datum_copy (fmgr.c:1821) ==17971== by 0x8952F8: expand_array (array_expanded.c:131) ==17971== by 0x1E971A28: plpgsql_exec_function (pl_exec.c:556) ==17971== by 0x1E97CF83: plpgsql_call_handler (pl_handler.c:277) ==17971== by 0x6BFA4E: ExecInterpExpr (execExprInterp.c:733) ==17971== by 0x6D9C8C: ExecEvalExprSwitchContext (executor.h:354) ==17971== by 0x6D9C8C: ExecProject (executor.h:388) ==17971== by 0x6D9C8C: project_aggregates (nodeAgg.c:1377) ==17971== by 0x6DB2B4: agg_retrieve_direct (nodeAgg.c:2520) ==17971== by 0x6DB2B4: ExecAgg (nodeAgg.c:2172) ==17971== by 0x6C4821: ExecProcNode (executor.h:272) ==17971== by 0x6C4821: ExecutePlan (execMain.c:1640) ==17971== by 0x6C4821: standard_ExecutorRun (execMain.c:365) ==17971== by 0x870535: PortalRunSelect (pquery.c:924) ==17971== Invalid read of size 8 ==17971== at 0x4C2EA0C: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035) ==17971== by 0x9C705A: memcpy (string3.h:51) ==17971== by 0x9C705A: pg_detoast_datum_copy (fmgr.c:1823) ==17971== by 0x8952F8: expand_array (array_expanded.c:131) ==17971== by 0x1E971A28: plpgsql_exec_function (pl_exec.c:556) ==17971== by 0x1E97CF83: plpgsql_call_handler (pl_handler.c:277) ==17971== by 0x6BFA4E: ExecInterpExpr (execExprInterp.c:733) ==17971== by 0x6D9C8C: ExecEvalExprSwitchContext (executor.h:354) ==17971== by 0x6D9C8C: ExecProject (executor.h:388) ==17971== by 0x6D9C8C: project_aggregates (nodeAgg.c:1377) ==17971== by 0x6DB2B4: agg_retrieve_direct (nodeAgg.c:2520) ==17971== by 0x6DB2B4: ExecAgg (nodeAgg.c:2172) ==17971== by 0x6C4821: ExecProcNode (executor.h:272) ==17971== by 0x6C4821: ExecutePlan (execMain.c:1640) ==17971== by 0x6C4821: standard_ExecutorRun (execMain.c:365) ==17971== by 0x870535: PortalRunSelect (pquery.c:924) ==17971== by 0x871CCE: PortalRun (pquery.c:768) ==17971== by 0x86D552: exec_simple_query (postgres.c:1274) ==17971== Address 0x1eb8c028 is 24 bytes before a block of size 123,876,112 alloc'd ==17971== at 0x4C29F73: malloc (vg_replace_malloc.c:309) ==17971== by 0x9E4204: AllocSetAlloc (aset.c:732) ==17971== by 0x9ED5BD: palloc (mcxt.c:1224) ==17971== by 0x9C704C: pg_detoast_datum_copy (fmgr.c:1821) ==17971== by 0x8952F8: expand_array (array_expanded.c:131) ==17971== by 0x1E971A28: plpgsql_exec_function (pl_exec.c:556) ==17971== by 0x1E97CF83: plpgsql_call_handler (pl_handler.c:277) ==17971== by 0x6BFA4E: ExecInterpExpr (execExprInterp.c:733) ==17971== by 0x6D9C8C: ExecEvalExprSwitchContext (executor.h:354) ==17971== by 0x6D9C8C: ExecProject (executor.h:388) ==17971== by 0x6D9C8C: project_aggregates (nodeAgg.c:1377) ==17971== by 0x6DB2B4: agg_retrieve_direct (nodeAgg.c:2520) ==17971== by 0x6DB2B4: ExecAgg (nodeAgg.c:2172) ==17971== by 0x6C4821: ExecProcNode (executor.h:272) ==17971== by 0x6C4821: ExecutePlan (execMain.c:1640) ==17971== by 0x6C4821: standard_ExecutorRun (execMain.c:365) ==17971== by 0x870535: PortalRunSelect (pquery.c:924) ==17971== Invalid read of size 8 ==17971== at 0x4C2EA18: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035) ==17971== by 0x9C705A: memcpy (string3.h:51) ==17971== by 0x9C705A: pg_detoast_datum_copy (fmgr.c:1823) ==17971== by 0x8952F8: expand_array (array_expanded.c:131) ==17971== by 0x1E971A28: plpgsql_exec_function (pl_exec.c:556) ==17971== by 0x1E97CF83: plpgsql_call_handler (pl_handler.c:277) ==17971== by 0x6BFA4E: ExecInterpExpr (execExprInterp.c:733) ==17971== by 0x6D9C8C: ExecEvalExprSwitchContext (executor.h:354) ==17971== by 0x6D9C8C: ExecProject (executor.h:388) ==17971== by 0x6D9C8C: project_aggregates (nodeAgg.c:1377) ==17971== by 0x6DB2B4: agg_retrieve_direct (nodeAgg.c:2520) ==17971== by 0x6DB2B4: ExecAgg (nodeAgg.c:2172) ==17971== by 0x6C4821: ExecProcNode (executor.h:272) ==17971== by 0x6C4821: ExecutePlan (execMain.c:1640) ==17971== by 0x6C4821: standard_ExecutorRun (execMain.c:365) ==17971== by 0x870535: PortalRunSelect (pquery.c:924) ==17971== by 0x871CCE: PortalRun (pquery.c:768) ==17971== by 0x86D552: exec_simple_query (postgres.c:1274) ==17971== Address 0x1eb8c020 is 32 bytes before a block of size 123,879,328 in arena "client" Another instance (compile locally rather than PGDG RPMs, and running the broken commit rather than v16 HEAD): ==30181== Source and destination overlap in memcpy(0x17691078, 0x15f6f8e0, 92126790) ==30181== at 0x4C2E81D: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035) ==30181== by 0x98C5DA: pg_detoast_datum_copy (in /home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres) ==30181== by 0x875ADC: expand_array (in /home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres) ==30181== by 0x174757B7: plpgsql_exec_function (in /home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/lib/plpgsql.so) ==30181== by 0x174806B5: plpgsql_call_handler (in /home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/lib/plpgsql.so) ==30181== by 0x694DBD: ExecInterpExpr (in /home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres) ==30181== by 0x69131A: ExecInterpExprStillValid (in /home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres) ==30181== by 0x6AEF2F: project_aggregates (in /home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres) ==30181== by 0x6B0169: agg_retrieve_direct (in /home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres) ==30181== by 0x6B0215: ExecAgg (in /home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres) ==30181== by 0x6A1637: ExecProcNodeFirst (in /home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres) ==30181== by 0x6998EC: ExecutePlan (in /home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres) ==30181== Invalid read of size 8 ==30181== at 0x4C2EA0C: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035) ==30181== by 0x98C5DA: pg_detoast_datum_copy (in /home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres) ==30181== by 0x875ADC: expand_array (in /home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres) ==30181== by 0x174757B7: plpgsql_exec_function (in /home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/lib/plpgsql.so) ==30181== by 0x174806B5: plpgsql_call_handler (in /home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/lib/plpgsql.so) ==30181== by 0x694DBD: ExecInterpExpr (in /home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres) ==30181== by 0x69131A: ExecInterpExprStillValid (in /home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres) ==30181== by 0x6AEF2F: project_aggregates (in /home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres) ==30181== by 0x6B0169: agg_retrieve_direct (in /home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres) ==30181== by 0x6B0215: ExecAgg (in /home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres) ==30181== by 0x6A1637: ExecProcNodeFirst (in /home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres) ==30181== by 0x6998EC: ExecutePlan (in /home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres) ==30181== Address 0x17691038 is 8 bytes before a block of size 92,126,848 alloc'd ==30181== at 0x4C29F73: malloc (vg_replace_malloc.c:309) ==30181== by 0x9A7980: AllocSetAlloc (in /home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres) ==30181== by 0x9B01A7: palloc (in /home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres) ==30181== by 0x98C5C9: pg_detoast_datum_copy (in /home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres) ==30181== by 0x875ADC: expand_array (in /home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres) ==30181== by 0x174757B7: plpgsql_exec_function (in /home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/lib/plpgsql.so) ==30181== by 0x174806B5: plpgsql_call_handler (in /home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/lib/plpgsql.so) ==30181== by 0x694DBD: ExecInterpExpr (in /home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres) ==30181== by 0x69131A: ExecInterpExprStillValid (in /home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres) ==30181== by 0x6AEF2F: project_aggregates (in /home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres) ==30181== by 0x6B0169: agg_retrieve_direct (in /home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres) ==30181== by 0x6B0215: ExecAgg (in /home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres) ==30181== Invalid read of size 8 ==30181== at 0x4C2EA18: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035) ==30181== by 0x98C5DA: pg_detoast_datum_copy (in /home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres) ==30181== by 0x875ADC: expand_array (in /home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres) ==30181== by 0x174757B7: plpgsql_exec_function (in /home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/lib/plpgsql.so) ==30181== by 0x174806B5: plpgsql_call_handler (in /home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/lib/plpgsql.so) ==30181== by 0x694DBD: ExecInterpExpr (in /home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres) ==30181== by 0x69131A: ExecInterpExprStillValid (in /home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres) ==30181== by 0x6AEF2F: project_aggregates (in /home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres) ==30181== by 0x6B0169: agg_retrieve_direct (in /home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres) ==30181== by 0x6B0215: ExecAgg (in /home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres) ==30181== by 0x6A1637: ExecProcNodeFirst (in /home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres) ==30181== by 0x6998EC: ExecutePlan (in /home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres) ==30181== Address 0x17691030 is 16 bytes before a block of size 92,126,848 alloc'd ==30181== at 0x4C29F73: malloc (vg_replace_malloc.c:309) ==30181== by 0x9A7980: AllocSetAlloc (in /home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres) ==30181== by 0x9B01A7: palloc (in /home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres) ==30181== by 0x98C5C9: pg_detoast_datum_copy (in /home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres) ==30181== by 0x875ADC: expand_array (in /home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres) ==30181== by 0x174757B7: plpgsql_exec_function (in /home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/lib/plpgsql.so) ==30181== by 0x174806B5: plpgsql_call_handler (in /home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/lib/plpgsql.so) ==30181== by 0x694DBD: ExecInterpExpr (in /home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres) ==30181== by 0x69131A: ExecInterpExprStillValid (in /home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres) ==30181== by 0x6AEF2F: project_aggregates (in /home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres) ==30181== by 0x6B0169: agg_retrieve_direct (in /home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres) ==30181== by 0x6B0215: ExecAgg (in /home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres) ==30181== Invalid read of size 8 ==30181== at 0x4C2EA20: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035) ==30181== by 0x98C5DA: pg_detoast_datum_copy (in /home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres) ==30181== by 0x875ADC: expand_array (in /home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres) ==30181== by 0x174757B7: plpgsql_exec_function (in /home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/lib/plpgsql.so) ==30181== by 0x174806B5: plpgsql_call_handler (in /home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/lib/plpgsql.so) ==30181== by 0x694DBD: ExecInterpExpr (in /home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres) ==30181== by 0x69131A: ExecInterpExprStillValid (in /home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres) ==30181== by 0x6AEF2F: project_aggregates (in /home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres) ==30181== by 0x6B0169: agg_retrieve_direct (in /home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres) ==30181== by 0x6B0215: ExecAgg (in /home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres) ==30181== by 0x6A1637: ExecProcNodeFirst (in /home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres) ==30181== by 0x6998EC: ExecutePlan (in /home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres) ==30181== Address 0x17691028 is 24 bytes before a block of size 92,126,848 alloc'd ==30181== at 0x4C29F73: malloc (vg_replace_malloc.c:309) ==30181== by 0x9A7980: AllocSetAlloc (in /home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres) ==30181== by 0x9B01A7: palloc (in /home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres) ==30181== by 0x98C5C9: pg_detoast_datum_copy (in /home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres) ==30181== by 0x875ADC: expand_array (in /home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres) ==30181== by 0x174757B7: plpgsql_exec_function (in /home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/lib/plpgsql.so) ==30181== by 0x174806B5: plpgsql_call_handler (in /home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/lib/plpgsql.so) ==30181== by 0x694DBD: ExecInterpExpr (in /home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres) ==30181== by 0x69131A: ExecInterpExprStillValid (in /home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres) ==30181== by 0x6AEF2F: project_aggregates (in /home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres) ==30181== by 0x6B0169: agg_retrieve_direct (in /home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres) ==30181== by 0x6B0215: ExecAgg (in /home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres) ==30181== ==30181== Invalid read of size 8 ==30181== at 0x4C2EA28: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035) ==30181== by 0x98C5DA: pg_detoast_datum_copy (in /home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres) ==30181== by 0x875ADC: expand_array (in /home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres) ==30181== by 0x174757B7: plpgsql_exec_function (in /home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/lib/plpgsql.so) ==30181== by 0x174806B5: plpgsql_call_handler (in /home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/lib/plpgsql.so) ==30181== by 0x694DBD: ExecInterpExpr (in /home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres) ==30181== by 0x69131A: ExecInterpExprStillValid (in /home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres) ==30181== by 0x6AEF2F: project_aggregates (in /home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres) ==30181== by 0x6B0169: agg_retrieve_direct (in /home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres) ==30181== by 0x6B0215: ExecAgg (in /home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres) ==30181== by 0x6A1637: ExecProcNodeFirst (in /home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres) ==30181== by 0x6998EC: ExecutePlan (in /home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres) ==30181== Address 0x17691020 is 32 bytes before a block of size 92,127,136 in arena "client"
On Sat, 15 Apr 2023 at 13:03, Justin Pryzby <pryzby@telsasoft.com> wrote: > Maybe you'll find valgrind errors to be helpful. I don't think that's really going to help. The crash already tells us there's a problem down the line, but if the commit you mention is to blame for this, then the problem is elsewhere, either in our assumption that we can get away without the datumCopy() or in the aggregate function producing the state that we're no longer copying. David
David Rowley <dgrowleyml@gmail.com> writes: > I don't think that's really going to help. The crash already tells us > there's a problem down the line, but if the commit you mention is to > blame for this, then the problem is elsewhere, either in our > assumption that we can get away without the datumCopy() or in the > aggregate function producing the state that we're no longer copying. It does smell like the aggregate output has been corrupted by the time it got to the plpgsql function. I don't particularly want to try to synthesize a test case from the essentially-zero SQL-level information we've been provided, though. And I doubt we can track this down without a test case. So please try to sanitize the case you have enough that you can share it. regards, tom lane
On Sat, Apr 15, 2023 at 11:33:58AM +1200, David Rowley wrote: > On Sat, 15 Apr 2023 at 10:48, Justin Pryzby <pryzby@telsasoft.com> wrote: > > > > On Sat, Apr 15, 2023 at 10:04:52AM +1200, David Rowley wrote: > > > Which aggregate function is being called here? Is it a custom > > > aggregate written in C, by any chance? > > > > That function is not an aggregate: > > There's an aggregate somewhere as indicated by this fragment from the > stack trace: > > > #12 project_aggregates (aggstate=aggstate@entry=0x607200070d38) at ../src/backend/executor/nodeAgg.c:1377 > > #13 0x0000000000903eb6 in agg_retrieve_direct (aggstate=aggstate@entry=0x607200070d38) at ../src/backend/executor/nodeAgg.c:2520 > > #14 0x0000000000904074 in ExecAgg (pstate=0x607200070d38) at ../src/backend/executor/nodeAgg.c:2172 > > Any chance you could try and come up with a minimal reproducer? You > have access to see which aggregates are being used here and what data > types are being given to them and then what's being done with the > return value of that aggregate that's causing the crash. Maybe you > can still get the crash if you mock up some data to aggregate and > strip out the guts from the plpgsql functions that we're crashing on? Try this
Attachment
Justin Pryzby <pryzby@telsasoft.com> writes: > On Sat, Apr 15, 2023 at 11:33:58AM +1200, David Rowley wrote: >> Any chance you could try and come up with a minimal reproducer? > Try this Thanks. I see the problem: finalize_aggregate is no longer forcing a R/W expanded datum returned by the finalfn into R/O form. If we re-use the aggregate result in multiple places, as this query does, then the first use can clobber the value for later uses. (The commit message specifically mentions this concern, so I wonder how we failed to actually do it :-() A minimal fix would be to force to R/O before returning from finalize_aggregate, but I wonder if we should do it later. By the by, I couldn't help noticing that ExecAggTransReparent completely fails to do what its name promises it should do, ie reparent a R/W datum into the proper context instead of physically copying it. That looks suspiciously like something that got broken during some other refactoring somewhere along the line. That'd be a performance bug not a correctness bug, but it should be looked into. regards, tom lane