Thread: v16dev: invalid memory alloc request size 8488348128

v16dev: invalid memory alloc request size 8488348128

From
Justin Pryzby
Date:
I hit this elog() while testing reports under v16 and changed to PANIC
to help diagnose.

DETAILS: PANIC:  invalid memory alloc request size 18446744072967930808
CONTEXT:  PL/pgSQL function array_weight(real[],real[]) while storing call arguments into local variables

I can't share the query, data, nor plpgsql functions themselves.

I reproduced the problem at this commit, but not at its parent.

commit 42b746d4c982257bf3f924176632b04dc288174b (HEAD)
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date:   Thu Oct 6 13:27:34 2022 -0400

    Remove uses of MemoryContextContains in nodeAgg.c and
    nodeWindowAgg.c.

#2  0x0000000001067af5 in errfinish (filename=filename@entry=0x168f1e0 "../src/backend/utils/mmgr/mcxt.c",
lineno=lineno@entry=1013,
    funcname=funcname@entry=0x16901a0 <__func__.17850> "MemoryContextAlloc") at ../src/backend/utils/error/elog.c:604
#3  0x00000000010c57c7 in MemoryContextAlloc (context=context@entry=0x604200032600, size=size@entry=8488348128) at
../src/backend/utils/mmgr/mcxt.c:1013
#4  0x0000000000db49a4 in copy_byval_expanded_array (eah=eah@entry=0x604200032718, oldeah=0x604200032718) at
../src/backend/utils/adt/array_expanded.c:195
#5  0x0000000000db5f7a in expand_array (arraydatum=105836584314672, parentcontext=<optimized out>,
metacache=0x7ffcbd2d29c0,metacache@entry=0x0)
 
    at ../src/backend/utils/adt/array_expanded.c:104
#6  0x00007f6c05a6b4d0 in plpgsql_exec_function (func=func@entry=0x6092004a4c58, fcinfo=fcinfo@entry=0x7f6c04f7efc8,
simple_eval_estate=simple_eval_estate@entry=0x0,
    simple_eval_resowner=simple_eval_resowner@entry=0x0, procedure_resowner=procedure_resowner@entry=0x0,
atomic=atomic@entry=true)
    at ../src/pl/plpgsql/src/pl_exec.c:556
#7  0x00007f6c05a76af4 in plpgsql_call_handler (fcinfo=<optimized out>) at ../src/pl/plpgsql/src/pl_handler.c:277
#8  0x00000000008b30cd in ExecInterpExpr (state=0x7f6c04fd6750, econtext=0x6072000712d0, isnull=0x7ffcbd2d2fa0) at
../src/backend/executor/execExprInterp.c:733
#9  0x00000000008a6c5f in ExecInterpExprStillValid (state=0x7f6c04fd6750, econtext=0x6072000712d0,
isNull=0x7ffcbd2d2fa0)
    at ../src/backend/executor/execExprInterp.c:1858
#10 0x000000000090032b in ExecEvalExprSwitchContext (isNull=0x7ffcbd2d2fa0, econtext=0x6072000712d0,
state=0x7f6c04fd6750)at ../src/include/executor/executor.h:354
 
#11 ExecProject (projInfo=0x7f6c04fd6748) at ../src/include/executor/executor.h:388
#12 project_aggregates (aggstate=aggstate@entry=0x607200070d38) at ../src/backend/executor/nodeAgg.c:1377
#13 0x0000000000903eb6 in agg_retrieve_direct (aggstate=aggstate@entry=0x607200070d38) at
../src/backend/executor/nodeAgg.c:2520
#14 0x0000000000904074 in ExecAgg (pstate=0x607200070d38) at ../src/backend/executor/nodeAgg.c:2172
#15 0x00000000008d90e0 in ExecProcNodeFirst (node=0x607200070d38) at ../src/backend/executor/execProcnode.c:464
#16 0x00000000008c1e5f in ExecProcNode (node=0x607200070d38) at ../src/include/executor/executor.h:272
#17 ExecutePlan (estate=estate@entry=0x607200070a18, planstate=0x607200070d38, use_parallel_mode=false,
operation=operation@entry=CMD_SELECT,sendTuples=true,
 
    numberTuples=numberTuples@entry=0, direction=direction@entry=ForwardScanDirection, dest=dest@entry=0x7f6c051abd28,
execute_once=execute_once@entry=true)
    at ../src/backend/executor/execMain.c:1640
#18 0x00000000008c3ffb in standard_ExecutorRun (queryDesc=0x604200016998, direction=ForwardScanDirection, count=0,
execute_once=<optimizedout>)
 
    at ../src/backend/executor/execMain.c:365
#19 0x00000000008c4125 in ExecutorRun (queryDesc=queryDesc@entry=0x604200016998,
direction=direction@entry=ForwardScanDirection,count=count@entry=0,
 
    execute_once=<optimized out>) at ../src/backend/executor/execMain.c:309
#20 0x0000000000d5d148 in PortalRunSelect (portal=portal@entry=0x607200028a18, forward=forward@entry=true, count=0,
count@entry=9223372036854775807,
    dest=dest@entry=0x7f6c051abd28) at ../src/backend/tcop/pquery.c:924
#21 0x0000000000d60dc8 in PortalRun (portal=portal@entry=0x607200028a18, count=count@entry=9223372036854775807,
isTopLevel=isTopLevel@entry=true,
    run_once=run_once@entry=true, dest=dest@entry=0x7f6c051abd28, altdest=altdest@entry=0x7f6c051abd28, qc=<optimized
out>,qc@entry=0x7ffcbd2d3580)
 
    at ../src/backend/tcop/pquery.c:768
#22 0x0000000000d595fd in exec_simple_query (
    query_string=query_string@entry=0x6082000cf238 "...
#23 0x0000000000d5c72c in PostgresMain (dbname=dbname@entry=0x60820000b378 "postgres",
username=username@entry=0x60820000b358"telsasoft")
 
    at ../src/backend/tcop/postgres.c:4632
#24 0x0000000000bddc19 in BackendRun (port=port@entry=0x60300000fc40) at ../src/backend/postmaster/postmaster.c:4461
#25 0x0000000000be2583 in BackendStartup (port=port@entry=0x60300000fc40) at
../src/backend/postmaster/postmaster.c:4189
#26 0x0000000000be2a05 in ServerLoop () at ../src/backend/postmaster/postmaster.c:1779
#27 0x0000000000be436b in PostmasterMain (argc=argc@entry=9, argv=argv@entry=0x600e0000df40) at
../src/backend/postmaster/postmaster.c:1463
#28 0x00000000009c33d5 in main (argc=9, argv=0x600e0000df40) at ../src/backend/main/main.c:200

(gdb) fr 4
#4  0x0000000000db49a4 in copy_byval_expanded_array (eah=eah@entry=0x604200032718, oldeah=0x604200032718) at
../src/backend/utils/adt/array_expanded.c:195
195             eah->dims = (int *) MemoryContextAlloc(objcxt, ndims * 2 * sizeof(int));
(gdb) p ndims
$1 = 1061043516

-- 
Justin



Re: v16dev: invalid memory alloc request size 8488348128

From
David Rowley
Date:
On Sat, 15 Apr 2023 at 08:36, Justin Pryzby <pryzby@telsasoft.com> wrote:
>
> I hit this elog() while testing reports under v16 and changed to PANIC
> to help diagnose.
>
> DETAILS: PANIC:  invalid memory alloc request size 18446744072967930808
> CONTEXT:  PL/pgSQL function array_weight(real[],real[]) while storing call arguments into local variables
>
> I can't share the query, data, nor plpgsql functions themselves.

Which aggregate function is being called here?  Is it a custom
aggregate written in C, by any chance?

David



Re: v16dev: invalid memory alloc request size 8488348128

From
Justin Pryzby
Date:
On Sat, Apr 15, 2023 at 10:04:52AM +1200, David Rowley wrote:
> On Sat, 15 Apr 2023 at 08:36, Justin Pryzby <pryzby@telsasoft.com> wrote:
> >
> > I hit this elog() while testing reports under v16 and changed to PANIC
> > to help diagnose.
> >
> > DETAILS: PANIC:  invalid memory alloc request size 18446744072967930808
> > CONTEXT:  PL/pgSQL function array_weight(real[],real[]) while storing call arguments into local variables
> >
> > I can't share the query, data, nor plpgsql functions themselves.
> 
> Which aggregate function is being called here?  Is it a custom
> aggregate written in C, by any chance?

That function is not an aggregate:

 ts=# \sf array_weight
 CREATE OR REPLACE FUNCTION public.array_weight(real[], real[])
  RETURNS real
   LANGUAGE plpgsql
    IMMUTABLE PARALLEL SAFE

And we don't have any C code loaded to postgres.  We do have polymorphic
aggregate functions using anycompatiblearray [*], and array_weight is
being called several times with those aggregates as its arguments.

*As in:

9e38c2bb5093ceb0c04d6315ccd8975bd17add66
97f73a978fc1aca59c6ad765548ce0096d95a923
09878cdd489ff7aca761998e7cb104f4fd98ae02




Re: v16dev: invalid memory alloc request size 8488348128

From
David Rowley
Date:
On Sat, 15 Apr 2023 at 10:48, Justin Pryzby <pryzby@telsasoft.com> wrote:
>
> On Sat, Apr 15, 2023 at 10:04:52AM +1200, David Rowley wrote:
> > Which aggregate function is being called here?  Is it a custom
> > aggregate written in C, by any chance?
>
> That function is not an aggregate:

There's an aggregate somewhere as indicated by this fragment from the
stack trace:

> #12 project_aggregates (aggstate=aggstate@entry=0x607200070d38) at ../src/backend/executor/nodeAgg.c:1377
> #13 0x0000000000903eb6 in agg_retrieve_direct (aggstate=aggstate@entry=0x607200070d38) at
../src/backend/executor/nodeAgg.c:2520
> #14 0x0000000000904074 in ExecAgg (pstate=0x607200070d38) at ../src/backend/executor/nodeAgg.c:2172

Any chance you could try and come up with a minimal reproducer?  You
have access to see which aggregates are being used here and what data
types are being given to them and then what's being done with the
return value of that aggregate that's causing the crash.  Maybe you
can still get the crash if you mock up some data to aggregate and
strip out the guts from the plpgsql functions that we're crashing on?

David



Re: v16dev: invalid memory alloc request size 8488348128

From
Tom Lane
Date:
David Rowley <dgrowleyml@gmail.com> writes:
> Any chance you could try and come up with a minimal reproducer?

Yeah --- there's an awful lot of moving parts there, and a stack
trace is not much to go on.

            regards, tom lane



Re: v16dev: invalid memory alloc request size 8488348128

From
Justin Pryzby
Date:
Maybe you'll find valgrind errors to be helpful.

==17971== Source and destination overlap in memcpy(0x1eb8c078, 0x1d88cb20, 123876054)
==17971==    at 0x4C2E81D: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035)
==17971==    by 0x9C705A: memcpy (string3.h:51)
==17971==    by 0x9C705A: pg_detoast_datum_copy (fmgr.c:1823)
==17971==    by 0x8952F8: expand_array (array_expanded.c:131)
==17971==    by 0x1E971A28: plpgsql_exec_function (pl_exec.c:556)
==17971==    by 0x1E97CF83: plpgsql_call_handler (pl_handler.c:277)
==17971==    by 0x6BFA4E: ExecInterpExpr (execExprInterp.c:733)
==17971==    by 0x6D9C8C: ExecEvalExprSwitchContext (executor.h:354)
==17971==    by 0x6D9C8C: ExecProject (executor.h:388)
==17971==    by 0x6D9C8C: project_aggregates (nodeAgg.c:1377)
==17971==    by 0x6DB2B4: agg_retrieve_direct (nodeAgg.c:2520)
==17971==    by 0x6DB2B4: ExecAgg (nodeAgg.c:2172)
==17971==    by 0x6C4821: ExecProcNode (executor.h:272)
==17971==    by 0x6C4821: ExecutePlan (execMain.c:1640)
==17971==    by 0x6C4821: standard_ExecutorRun (execMain.c:365)
==17971==    by 0x870535: PortalRunSelect (pquery.c:924)
==17971==    by 0x871CCE: PortalRun (pquery.c:768)
==17971==    by 0x86D552: exec_simple_query (postgres.c:1274)

==17971== Invalid read of size 8
==17971==    at 0x4C2EA20: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035)
==17971==    by 0x9C705A: memcpy (string3.h:51)
==17971==    by 0x9C705A: pg_detoast_datum_copy (fmgr.c:1823)
==17971==    by 0x8952F8: expand_array (array_expanded.c:131)
==17971==    by 0x1E971A28: plpgsql_exec_function (pl_exec.c:556)
==17971==    by 0x1E97CF83: plpgsql_call_handler (pl_handler.c:277)
==17971==    by 0x6BFA4E: ExecInterpExpr (execExprInterp.c:733)
==17971==    by 0x6D9C8C: ExecEvalExprSwitchContext (executor.h:354)
==17971==    by 0x6D9C8C: ExecProject (executor.h:388)
==17971==    by 0x6D9C8C: project_aggregates (nodeAgg.c:1377)
==17971==    by 0x6DB2B4: agg_retrieve_direct (nodeAgg.c:2520)
==17971==    by 0x6DB2B4: ExecAgg (nodeAgg.c:2172)
==17971==    by 0x6C4821: ExecProcNode (executor.h:272)
==17971==    by 0x6C4821: ExecutePlan (execMain.c:1640)
==17971==    by 0x6C4821: standard_ExecutorRun (execMain.c:365)
==17971==    by 0x870535: PortalRunSelect (pquery.c:924)
==17971==    by 0x871CCE: PortalRun (pquery.c:768)
==17971==    by 0x86D552: exec_simple_query (postgres.c:1274)
==17971==  Address 0x1eb8c038 is 8 bytes before a block of size 123,876,112 alloc'd
==17971==    at 0x4C29F73: malloc (vg_replace_malloc.c:309)
==17971==    by 0x9E4204: AllocSetAlloc (aset.c:732)
==17971==    by 0x9ED5BD: palloc (mcxt.c:1224)
==17971==    by 0x9C704C: pg_detoast_datum_copy (fmgr.c:1821)
==17971==    by 0x8952F8: expand_array (array_expanded.c:131)
==17971==    by 0x1E971A28: plpgsql_exec_function (pl_exec.c:556)
==17971==    by 0x1E97CF83: plpgsql_call_handler (pl_handler.c:277)
==17971==    by 0x6BFA4E: ExecInterpExpr (execExprInterp.c:733)
==17971==    by 0x6D9C8C: ExecEvalExprSwitchContext (executor.h:354)
==17971==    by 0x6D9C8C: ExecProject (executor.h:388)
==17971==    by 0x6D9C8C: project_aggregates (nodeAgg.c:1377)
==17971==    by 0x6DB2B4: agg_retrieve_direct (nodeAgg.c:2520)
==17971==    by 0x6DB2B4: ExecAgg (nodeAgg.c:2172)
==17971==    by 0x6C4821: ExecProcNode (executor.h:272)
==17971==    by 0x6C4821: ExecutePlan (execMain.c:1640)
==17971==    by 0x6C4821: standard_ExecutorRun (execMain.c:365)
==17971==    by 0x870535: PortalRunSelect (pquery.c:924)

==17971== Invalid read of size 8
==17971==    at 0x4C2EA28: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035)
==17971==    by 0x9C705A: memcpy (string3.h:51)
==17971==    by 0x9C705A: pg_detoast_datum_copy (fmgr.c:1823)
==17971==    by 0x8952F8: expand_array (array_expanded.c:131)
==17971==    by 0x1E971A28: plpgsql_exec_function (pl_exec.c:556)
==17971==    by 0x1E97CF83: plpgsql_call_handler (pl_handler.c:277)
==17971==    by 0x6BFA4E: ExecInterpExpr (execExprInterp.c:733)
==17971==    by 0x6D9C8C: ExecEvalExprSwitchContext (executor.h:354)
==17971==    by 0x6D9C8C: ExecProject (executor.h:388)
==17971==    by 0x6D9C8C: project_aggregates (nodeAgg.c:1377)
==17971==    by 0x6DB2B4: agg_retrieve_direct (nodeAgg.c:2520)
==17971==    by 0x6DB2B4: ExecAgg (nodeAgg.c:2172)
==17971==    by 0x6C4821: ExecProcNode (executor.h:272)
==17971==    by 0x6C4821: ExecutePlan (execMain.c:1640)
==17971==    by 0x6C4821: standard_ExecutorRun (execMain.c:365)
==17971==    by 0x870535: PortalRunSelect (pquery.c:924)
==17971==    by 0x871CCE: PortalRun (pquery.c:768)
==17971==    by 0x86D552: exec_simple_query (postgres.c:1274)
==17971==  Address 0x1eb8c030 is 16 bytes before a block of size 123,876,112 alloc'd
==17971==    at 0x4C29F73: malloc (vg_replace_malloc.c:309)
==17971==    by 0x9E4204: AllocSetAlloc (aset.c:732)
==17971==    by 0x9ED5BD: palloc (mcxt.c:1224)
==17971==    by 0x9C704C: pg_detoast_datum_copy (fmgr.c:1821)
==17971==    by 0x8952F8: expand_array (array_expanded.c:131)
==17971==    by 0x1E971A28: plpgsql_exec_function (pl_exec.c:556)
==17971==    by 0x1E97CF83: plpgsql_call_handler (pl_handler.c:277)
==17971==    by 0x6BFA4E: ExecInterpExpr (execExprInterp.c:733)
==17971==    by 0x6D9C8C: ExecEvalExprSwitchContext (executor.h:354)
==17971==    by 0x6D9C8C: ExecProject (executor.h:388)
==17971==    by 0x6D9C8C: project_aggregates (nodeAgg.c:1377)
==17971==    by 0x6DB2B4: agg_retrieve_direct (nodeAgg.c:2520)
==17971==    by 0x6DB2B4: ExecAgg (nodeAgg.c:2172)
==17971==    by 0x6C4821: ExecProcNode (executor.h:272)
==17971==    by 0x6C4821: ExecutePlan (execMain.c:1640)
==17971==    by 0x6C4821: standard_ExecutorRun (execMain.c:365)
==17971==    by 0x870535: PortalRunSelect (pquery.c:924)

==17971== Invalid read of size 8
==17971==    at 0x4C2EA0C: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035)
==17971==    by 0x9C705A: memcpy (string3.h:51)
==17971==    by 0x9C705A: pg_detoast_datum_copy (fmgr.c:1823)
==17971==    by 0x8952F8: expand_array (array_expanded.c:131)
==17971==    by 0x1E971A28: plpgsql_exec_function (pl_exec.c:556)
==17971==    by 0x1E97CF83: plpgsql_call_handler (pl_handler.c:277)
==17971==    by 0x6BFA4E: ExecInterpExpr (execExprInterp.c:733)
==17971==    by 0x6D9C8C: ExecEvalExprSwitchContext (executor.h:354)
==17971==    by 0x6D9C8C: ExecProject (executor.h:388)
==17971==    by 0x6D9C8C: project_aggregates (nodeAgg.c:1377)
==17971==    by 0x6DB2B4: agg_retrieve_direct (nodeAgg.c:2520)
==17971==    by 0x6DB2B4: ExecAgg (nodeAgg.c:2172)
==17971==    by 0x6C4821: ExecProcNode (executor.h:272)
==17971==    by 0x6C4821: ExecutePlan (execMain.c:1640)
==17971==    by 0x6C4821: standard_ExecutorRun (execMain.c:365)
==17971==    by 0x870535: PortalRunSelect (pquery.c:924)
==17971==    by 0x871CCE: PortalRun (pquery.c:768)
==17971==    by 0x86D552: exec_simple_query (postgres.c:1274)
==17971==  Address 0x1eb8c028 is 24 bytes before a block of size 123,876,112 alloc'd
==17971==    at 0x4C29F73: malloc (vg_replace_malloc.c:309)
==17971==    by 0x9E4204: AllocSetAlloc (aset.c:732)
==17971==    by 0x9ED5BD: palloc (mcxt.c:1224)
==17971==    by 0x9C704C: pg_detoast_datum_copy (fmgr.c:1821)
==17971==    by 0x8952F8: expand_array (array_expanded.c:131)
==17971==    by 0x1E971A28: plpgsql_exec_function (pl_exec.c:556)
==17971==    by 0x1E97CF83: plpgsql_call_handler (pl_handler.c:277)
==17971==    by 0x6BFA4E: ExecInterpExpr (execExprInterp.c:733)
==17971==    by 0x6D9C8C: ExecEvalExprSwitchContext (executor.h:354)
==17971==    by 0x6D9C8C: ExecProject (executor.h:388)
==17971==    by 0x6D9C8C: project_aggregates (nodeAgg.c:1377)
==17971==    by 0x6DB2B4: agg_retrieve_direct (nodeAgg.c:2520)
==17971==    by 0x6DB2B4: ExecAgg (nodeAgg.c:2172)
==17971==    by 0x6C4821: ExecProcNode (executor.h:272)
==17971==    by 0x6C4821: ExecutePlan (execMain.c:1640)
==17971==    by 0x6C4821: standard_ExecutorRun (execMain.c:365)
==17971==    by 0x870535: PortalRunSelect (pquery.c:924)



==17971== Invalid read of size 8
==17971==    at 0x4C2EA0C: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035)
==17971==    by 0x9C705A: memcpy (string3.h:51)
==17971==    by 0x9C705A: pg_detoast_datum_copy (fmgr.c:1823)
==17971==    by 0x8952F8: expand_array (array_expanded.c:131)
==17971==    by 0x1E971A28: plpgsql_exec_function (pl_exec.c:556)
==17971==    by 0x1E97CF83: plpgsql_call_handler (pl_handler.c:277)
==17971==    by 0x6BFA4E: ExecInterpExpr (execExprInterp.c:733)
==17971==    by 0x6D9C8C: ExecEvalExprSwitchContext (executor.h:354)
==17971==    by 0x6D9C8C: ExecProject (executor.h:388)
==17971==    by 0x6D9C8C: project_aggregates (nodeAgg.c:1377)
==17971==    by 0x6DB2B4: agg_retrieve_direct (nodeAgg.c:2520)
==17971==    by 0x6DB2B4: ExecAgg (nodeAgg.c:2172)
==17971==    by 0x6C4821: ExecProcNode (executor.h:272)
==17971==    by 0x6C4821: ExecutePlan (execMain.c:1640)
==17971==    by 0x6C4821: standard_ExecutorRun (execMain.c:365)
==17971==    by 0x870535: PortalRunSelect (pquery.c:924)
==17971==    by 0x871CCE: PortalRun (pquery.c:768)
==17971==    by 0x86D552: exec_simple_query (postgres.c:1274)
==17971==  Address 0x1eb8c028 is 24 bytes before a block of size 123,876,112 alloc'd
==17971==    at 0x4C29F73: malloc (vg_replace_malloc.c:309)
==17971==    by 0x9E4204: AllocSetAlloc (aset.c:732)
==17971==    by 0x9ED5BD: palloc (mcxt.c:1224)
==17971==    by 0x9C704C: pg_detoast_datum_copy (fmgr.c:1821)
==17971==    by 0x8952F8: expand_array (array_expanded.c:131)
==17971==    by 0x1E971A28: plpgsql_exec_function (pl_exec.c:556)
==17971==    by 0x1E97CF83: plpgsql_call_handler (pl_handler.c:277)
==17971==    by 0x6BFA4E: ExecInterpExpr (execExprInterp.c:733)
==17971==    by 0x6D9C8C: ExecEvalExprSwitchContext (executor.h:354)
==17971==    by 0x6D9C8C: ExecProject (executor.h:388)
==17971==    by 0x6D9C8C: project_aggregates (nodeAgg.c:1377)
==17971==    by 0x6DB2B4: agg_retrieve_direct (nodeAgg.c:2520)
==17971==    by 0x6DB2B4: ExecAgg (nodeAgg.c:2172)
==17971==    by 0x6C4821: ExecProcNode (executor.h:272)
==17971==    by 0x6C4821: ExecutePlan (execMain.c:1640)
==17971==    by 0x6C4821: standard_ExecutorRun (execMain.c:365)
==17971==    by 0x870535: PortalRunSelect (pquery.c:924)

==17971== Invalid read of size 8
==17971==    at 0x4C2EA18: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035)
==17971==    by 0x9C705A: memcpy (string3.h:51)
==17971==    by 0x9C705A: pg_detoast_datum_copy (fmgr.c:1823)
==17971==    by 0x8952F8: expand_array (array_expanded.c:131)
==17971==    by 0x1E971A28: plpgsql_exec_function (pl_exec.c:556)
==17971==    by 0x1E97CF83: plpgsql_call_handler (pl_handler.c:277)
==17971==    by 0x6BFA4E: ExecInterpExpr (execExprInterp.c:733)
==17971==    by 0x6D9C8C: ExecEvalExprSwitchContext (executor.h:354)
==17971==    by 0x6D9C8C: ExecProject (executor.h:388)
==17971==    by 0x6D9C8C: project_aggregates (nodeAgg.c:1377)
==17971==    by 0x6DB2B4: agg_retrieve_direct (nodeAgg.c:2520)
==17971==    by 0x6DB2B4: ExecAgg (nodeAgg.c:2172)
==17971==    by 0x6C4821: ExecProcNode (executor.h:272)
==17971==    by 0x6C4821: ExecutePlan (execMain.c:1640)
==17971==    by 0x6C4821: standard_ExecutorRun (execMain.c:365)
==17971==    by 0x870535: PortalRunSelect (pquery.c:924)
==17971==    by 0x871CCE: PortalRun (pquery.c:768)
==17971==    by 0x86D552: exec_simple_query (postgres.c:1274)
==17971==  Address 0x1eb8c020 is 32 bytes before a block of size 123,879,328 in arena "client"


Another instance (compile locally rather than PGDG RPMs, and running the broken
commit rather than v16 HEAD):

==30181== Source and destination overlap in memcpy(0x17691078, 0x15f6f8e0, 92126790)
==30181==    at 0x4C2E81D: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035)
==30181==    by 0x98C5DA: pg_detoast_datum_copy (in
/home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres)
==30181==    by 0x875ADC: expand_array (in
/home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres)
==30181==    by 0x174757B7: plpgsql_exec_function (in
/home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/lib/plpgsql.so)
==30181==    by 0x174806B5: plpgsql_call_handler (in
/home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/lib/plpgsql.so)
==30181==    by 0x694DBD: ExecInterpExpr (in
/home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres)
==30181==    by 0x69131A: ExecInterpExprStillValid (in
/home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres)
==30181==    by 0x6AEF2F: project_aggregates (in
/home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres)
==30181==    by 0x6B0169: agg_retrieve_direct (in
/home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres)
==30181==    by 0x6B0215: ExecAgg (in
/home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres)
==30181==    by 0x6A1637: ExecProcNodeFirst (in
/home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres)
==30181==    by 0x6998EC: ExecutePlan (in
/home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres)


==30181== Invalid read of size 8
==30181==    at 0x4C2EA0C: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035)
==30181==    by 0x98C5DA: pg_detoast_datum_copy (in
/home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres)
==30181==    by 0x875ADC: expand_array (in
/home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres)
==30181==    by 0x174757B7: plpgsql_exec_function (in
/home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/lib/plpgsql.so)
==30181==    by 0x174806B5: plpgsql_call_handler (in
/home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/lib/plpgsql.so)
==30181==    by 0x694DBD: ExecInterpExpr (in
/home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres)
==30181==    by 0x69131A: ExecInterpExprStillValid (in
/home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres)
==30181==    by 0x6AEF2F: project_aggregates (in
/home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres)
==30181==    by 0x6B0169: agg_retrieve_direct (in
/home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres)
==30181==    by 0x6B0215: ExecAgg (in
/home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres)
==30181==    by 0x6A1637: ExecProcNodeFirst (in
/home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres)
==30181==    by 0x6998EC: ExecutePlan (in
/home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres)
==30181==  Address 0x17691038 is 8 bytes before a block of size 92,126,848 alloc'd
==30181==    at 0x4C29F73: malloc (vg_replace_malloc.c:309)
==30181==    by 0x9A7980: AllocSetAlloc (in
/home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres)
==30181==    by 0x9B01A7: palloc (in
/home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres)
==30181==    by 0x98C5C9: pg_detoast_datum_copy (in
/home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres)
==30181==    by 0x875ADC: expand_array (in
/home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres)
==30181==    by 0x174757B7: plpgsql_exec_function (in
/home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/lib/plpgsql.so)
==30181==    by 0x174806B5: plpgsql_call_handler (in
/home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/lib/plpgsql.so)
==30181==    by 0x694DBD: ExecInterpExpr (in
/home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres)
==30181==    by 0x69131A: ExecInterpExprStillValid (in
/home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres)
==30181==    by 0x6AEF2F: project_aggregates (in
/home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres)
==30181==    by 0x6B0169: agg_retrieve_direct (in
/home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres)
==30181==    by 0x6B0215: ExecAgg (in
/home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres)


==30181== Invalid read of size 8
==30181==    at 0x4C2EA18: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035)
==30181==    by 0x98C5DA: pg_detoast_datum_copy (in
/home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres)
==30181==    by 0x875ADC: expand_array (in
/home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres)
==30181==    by 0x174757B7: plpgsql_exec_function (in
/home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/lib/plpgsql.so)
==30181==    by 0x174806B5: plpgsql_call_handler (in
/home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/lib/plpgsql.so)
==30181==    by 0x694DBD: ExecInterpExpr (in
/home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres)
==30181==    by 0x69131A: ExecInterpExprStillValid (in
/home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres)
==30181==    by 0x6AEF2F: project_aggregates (in
/home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres)
==30181==    by 0x6B0169: agg_retrieve_direct (in
/home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres)
==30181==    by 0x6B0215: ExecAgg (in
/home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres)
==30181==    by 0x6A1637: ExecProcNodeFirst (in
/home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres)
==30181==    by 0x6998EC: ExecutePlan (in
/home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres)
==30181==  Address 0x17691030 is 16 bytes before a block of size 92,126,848 alloc'd
==30181==    at 0x4C29F73: malloc (vg_replace_malloc.c:309)
==30181==    by 0x9A7980: AllocSetAlloc (in
/home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres)
==30181==    by 0x9B01A7: palloc (in
/home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres)
==30181==    by 0x98C5C9: pg_detoast_datum_copy (in
/home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres)
==30181==    by 0x875ADC: expand_array (in
/home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres)
==30181==    by 0x174757B7: plpgsql_exec_function (in
/home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/lib/plpgsql.so)
==30181==    by 0x174806B5: plpgsql_call_handler (in
/home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/lib/plpgsql.so)
==30181==    by 0x694DBD: ExecInterpExpr (in
/home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres)
==30181==    by 0x69131A: ExecInterpExprStillValid (in
/home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres)
==30181==    by 0x6AEF2F: project_aggregates (in
/home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres)
==30181==    by 0x6B0169: agg_retrieve_direct (in
/home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres)
==30181==    by 0x6B0215: ExecAgg (in
/home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres)

==30181== Invalid read of size 8
==30181==    at 0x4C2EA20: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035)
==30181==    by 0x98C5DA: pg_detoast_datum_copy (in
/home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres)
==30181==    by 0x875ADC: expand_array (in
/home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres)
==30181==    by 0x174757B7: plpgsql_exec_function (in
/home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/lib/plpgsql.so)
==30181==    by 0x174806B5: plpgsql_call_handler (in
/home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/lib/plpgsql.so)
==30181==    by 0x694DBD: ExecInterpExpr (in
/home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres)
==30181==    by 0x69131A: ExecInterpExprStillValid (in
/home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres)
==30181==    by 0x6AEF2F: project_aggregates (in
/home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres)
==30181==    by 0x6B0169: agg_retrieve_direct (in
/home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres)
==30181==    by 0x6B0215: ExecAgg (in
/home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres)
==30181==    by 0x6A1637: ExecProcNodeFirst (in
/home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres)
==30181==    by 0x6998EC: ExecutePlan (in
/home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres)
==30181==  Address 0x17691028 is 24 bytes before a block of size 92,126,848 alloc'd
==30181==    at 0x4C29F73: malloc (vg_replace_malloc.c:309)
==30181==    by 0x9A7980: AllocSetAlloc (in
/home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres)
==30181==    by 0x9B01A7: palloc (in
/home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres)
==30181==    by 0x98C5C9: pg_detoast_datum_copy (in
/home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres)
==30181==    by 0x875ADC: expand_array (in
/home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres)
==30181==    by 0x174757B7: plpgsql_exec_function (in
/home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/lib/plpgsql.so)
==30181==    by 0x174806B5: plpgsql_call_handler (in
/home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/lib/plpgsql.so)
==30181==    by 0x694DBD: ExecInterpExpr (in
/home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres)
==30181==    by 0x69131A: ExecInterpExprStillValid (in
/home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres)
==30181==    by 0x6AEF2F: project_aggregates (in
/home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres)
==30181==    by 0x6B0169: agg_retrieve_direct (in
/home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres)
==30181==    by 0x6B0215: ExecAgg (in
/home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres)
==30181==
==30181== Invalid read of size 8
==30181==    at 0x4C2EA28: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1035)
==30181==    by 0x98C5DA: pg_detoast_datum_copy (in
/home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres)
==30181==    by 0x875ADC: expand_array (in
/home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres)
==30181==    by 0x174757B7: plpgsql_exec_function (in
/home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/lib/plpgsql.so)
==30181==    by 0x174806B5: plpgsql_call_handler (in
/home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/lib/plpgsql.so)
==30181==    by 0x694DBD: ExecInterpExpr (in
/home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres)
==30181==    by 0x69131A: ExecInterpExprStillValid (in
/home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres)
==30181==    by 0x6AEF2F: project_aggregates (in
/home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres)
==30181==    by 0x6B0169: agg_retrieve_direct (in
/home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres)
==30181==    by 0x6B0215: ExecAgg (in
/home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres)
==30181==    by 0x6A1637: ExecProcNodeFirst (in
/home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres)
==30181==    by 0x6998EC: ExecutePlan (in
/home/pryzbyj/git/postgresql/build.autoconf/tmp_install/usr/local/pgsql/bin/postgres)
==30181==  Address 0x17691020 is 32 bytes before a block of size 92,127,136 in arena "client"



Re: v16dev: invalid memory alloc request size 8488348128

From
David Rowley
Date:
On Sat, 15 Apr 2023 at 13:03, Justin Pryzby <pryzby@telsasoft.com> wrote:
> Maybe you'll find valgrind errors to be helpful.

I don't think that's really going to help.  The crash already tells us
there's a problem down the line, but if the commit you mention is to
blame for this, then the problem is elsewhere, either in our
assumption that we can get away without the datumCopy() or in the
aggregate function producing the state that we're no longer copying.

David



Re: v16dev: invalid memory alloc request size 8488348128

From
Tom Lane
Date:
David Rowley <dgrowleyml@gmail.com> writes:
> I don't think that's really going to help.  The crash already tells us
> there's a problem down the line, but if the commit you mention is to
> blame for this, then the problem is elsewhere, either in our
> assumption that we can get away without the datumCopy() or in the
> aggregate function producing the state that we're no longer copying.

It does smell like the aggregate output has been corrupted by the time
it got to the plpgsql function.  I don't particularly want to try to
synthesize a test case from the essentially-zero SQL-level information
we've been provided, though.  And I doubt we can track this down without
a test case.  So please try to sanitize the case you have enough that
you can share it.

            regards, tom lane



Re: v16dev: invalid memory alloc request size 8488348128

From
Justin Pryzby
Date:
On Sat, Apr 15, 2023 at 11:33:58AM +1200, David Rowley wrote:
> On Sat, 15 Apr 2023 at 10:48, Justin Pryzby <pryzby@telsasoft.com> wrote:
> >
> > On Sat, Apr 15, 2023 at 10:04:52AM +1200, David Rowley wrote:
> > > Which aggregate function is being called here?  Is it a custom
> > > aggregate written in C, by any chance?
> >
> > That function is not an aggregate:
> 
> There's an aggregate somewhere as indicated by this fragment from the
> stack trace:
> 
> > #12 project_aggregates (aggstate=aggstate@entry=0x607200070d38) at ../src/backend/executor/nodeAgg.c:1377
> > #13 0x0000000000903eb6 in agg_retrieve_direct (aggstate=aggstate@entry=0x607200070d38) at
../src/backend/executor/nodeAgg.c:2520
> > #14 0x0000000000904074 in ExecAgg (pstate=0x607200070d38) at ../src/backend/executor/nodeAgg.c:2172
> 
> Any chance you could try and come up with a minimal reproducer?  You
> have access to see which aggregates are being used here and what data
> types are being given to them and then what's being done with the
> return value of that aggregate that's causing the crash.  Maybe you
> can still get the crash if you mock up some data to aggregate and
> strip out the guts from the plpgsql functions that we're crashing on?

Try this

Attachment

Re: v16dev: invalid memory alloc request size 8488348128

From
Tom Lane
Date:
Justin Pryzby <pryzby@telsasoft.com> writes:
> On Sat, Apr 15, 2023 at 11:33:58AM +1200, David Rowley wrote:
>> Any chance you could try and come up with a minimal reproducer?

> Try this

Thanks.  I see the problem: finalize_aggregate is no longer forcing
a R/W expanded datum returned by the finalfn into R/O form.  If
we re-use the aggregate result in multiple places, as this query
does, then the first use can clobber the value for later uses.
(The commit message specifically mentions this concern, so I wonder
how we failed to actually do it :-()

A minimal fix would be to force to R/O before returning from
finalize_aggregate, but I wonder if we should do it later.

By the by, I couldn't help noticing that ExecAggTransReparent
completely fails to do what its name promises it should do, ie
reparent a R/W datum into the proper context instead of physically
copying it.  That looks suspiciously like something that got broken
during some other refactoring somewhere along the line.  That'd be a
performance bug not a correctness bug, but it should be looked into.

            regards, tom lane