Re: pg13.2: invalid memory alloc request size NNNN - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: pg13.2: invalid memory alloc request size NNNN
Date
Msg-id c27089c6-3a23-1769-6ec9-9012fef5d3b1@enterprisedb.com
Whole thread Raw
In response to pg13.2: invalid memory alloc request size NNNN  (Justin Pryzby <pryzby@telsasoft.com>)
Responses Re: pg13.2: invalid memory alloc request size NNNN
Re: pg13.2: invalid memory alloc request size NNNN
List pgsql-hackers

On 2/12/21 2:48 AM, Justin Pryzby wrote:
> ts=# \errverbose
> ERROR:  XX000: invalid memory alloc request size 18446744073709551613
> 
> #0  pg_re_throw () at elog.c:1716
> #1  0x0000000000a33b12 in errfinish (filename=0xbff20e "mcxt.c", lineno=959, funcname=0xbff2db <__func__.6684>
"palloc")at elog.c:502
 
> #2  0x0000000000a6760d in palloc (size=18446744073709551613) at mcxt.c:959
> #3  0x00000000009fb149 in text_to_cstring (t=0x2aaae8023010) at varlena.c:212
> #4  0x00000000009fbf05 in textout (fcinfo=0x2094538) at varlena.c:557
> #5  0x00000000006bdd50 in ExecInterpExpr (state=0x2093990, econtext=0x20933d8, isnull=0x7fff5bf04a87) at
execExprInterp.c:1112
> #6  0x00000000006d4f18 in ExecEvalExprSwitchContext (state=0x2093990, econtext=0x20933d8, isNull=0x7fff5bf04a87) at
../../../src/include/executor/executor.h:316
> #7  0x00000000006d4f81 in ExecProject (projInfo=0x2093988) at ../../../src/include/executor/executor.h:350
> #8  0x00000000006d5371 in ExecScan (node=0x20932c8, accessMtd=0x7082e0 <SeqNext>, recheckMtd=0x708385 <SeqRecheck>)
atexecScan.c:238
 
> #9  0x00000000007083c2 in ExecSeqScan (pstate=0x20932c8) at nodeSeqscan.c:112
> #10 0x00000000006d1b00 in ExecProcNodeInstr (node=0x20932c8) at execProcnode.c:466
> #11 0x00000000006e742c in ExecProcNode (node=0x20932c8) at ../../../src/include/executor/executor.h:248
> #12 0x00000000006e77de in ExecAppend (pstate=0x2089208) at nodeAppend.c:267
> #13 0x00000000006d1b00 in ExecProcNodeInstr (node=0x2089208) at execProcnode.c:466
> #14 0x000000000070964f in ExecProcNode (node=0x2089208) at ../../../src/include/executor/executor.h:248
> #15 0x0000000000709795 in ExecSort (pstate=0x2088ff8) at nodeSort.c:108
> #16 0x00000000006d1b00 in ExecProcNodeInstr (node=0x2088ff8) at execProcnode.c:466
> #17 0x00000000006d1ad1 in ExecProcNodeFirst (node=0x2088ff8) at execProcnode.c:450
> #18 0x00000000006dec36 in ExecProcNode (node=0x2088ff8) at ../../../src/include/executor/executor.h:248
> #19 0x00000000006df079 in fetch_input_tuple (aggstate=0x2088a20) at nodeAgg.c:589
> #20 0x00000000006e1fad in agg_retrieve_direct (aggstate=0x2088a20) at nodeAgg.c:2368
> #21 0x00000000006e1bfd in ExecAgg (pstate=0x2088a20) at nodeAgg.c:2183
> #22 0x00000000006d1b00 in ExecProcNodeInstr (node=0x2088a20) at execProcnode.c:466
> #23 0x00000000006d1ad1 in ExecProcNodeFirst (node=0x2088a20) at execProcnode.c:450
> #24 0x00000000006c6ffa in ExecProcNode (node=0x2088a20) at ../../../src/include/executor/executor.h:248
> #25 0x00000000006c966b in ExecutePlan (estate=0x2032f48, planstate=0x2088a20, use_parallel_mode=false,
operation=CMD_SELECT,sendTuples=true, numberTuples=0, direction=ForwardScanDirection, dest=0xbb3400 <donothingDR>,
 
>      execute_once=true) at execMain.c:1632
> 
> #3  0x00000000009fb149 in text_to_cstring (t=0x2aaae8023010) at varlena.c:212
> 212             result = (char *) palloc(len + 1);
> 
> (gdb) l
> 207             /* must cast away the const, unfortunately */
> 208             text       *tunpacked = pg_detoast_datum_packed(unconstify(text *, t));
> 209             int                     len = VARSIZE_ANY_EXHDR(tunpacked);
> 210             char       *result;
> 211
> 212             result = (char *) palloc(len + 1);
> 
> (gdb) p len
> $1 = -4
> 
> This VM had some issue early today and I killed the VM, causing PG to execute
> recovery.  I'm tentatively blaming that on zfs, so this could conceivably be a
> data error (although recovery supposedly would have resolved it).  I just
> checked and data_checksums=off.
> 

This seems very much like a corrupted varlena header - length (-4) is 
clearly bogus, and it's what triggers the problem, because that's what 
wraps around to 18446744073709551613 (which is 0xFFFFFFFFFFFFFFFD).

This has to be a value stored in a table, not some intermediate value 
created during execution. So I don't think the exact query matters. Can 
you try doing something like pg_dump, which has to detoast everything?

The question is whether this is due to the VM getting killed in some 
strange way (what VM system is this, how is the storage mounted?) or 
whether the recovery is borked and failed to do the right thing.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Isaac Morland
Date:
Subject: Trigger execution role
Next
From: Matthias van de Meent
Date:
Subject: Re: Improvements and additions to COPY progress reporting