Re: aggregate crash - Mailing list pgsql-hackers

From Andres Freund
Subject Re: aggregate crash
Date
Msg-id 20200103193539.mzimh6ufogyppsrl@alap3.anarazel.de
Whole thread Raw
In response to aggregate crash  (Teodor Sigaev <teodor@sigaev.ru>)
List pgsql-hackers
Hi,

On 2019-12-27 20:13:26 +0300, Teodor Sigaev wrote:
> Found crash on production instance, assert-enabled build crashes in pfree()
> call, with default config. v11, v12 and head are affected, but, seems, you
> need to be a bit lucky.
> 
> The bug is comparing old and new aggregate pass-by-ref values only by
> pointer value itself, despite on null flag. Any function which returns null
> doesn't worry about actual returned Datum value, so that comparison isn't
> enough. Test case shows bug with ExecInterpExpr() but there several similar
> places (thanks Nikita Glukhov for help).
> Attached patch adds check of null flag.

Hm. I don't understand the problem here. Why do we need to reparent in
that case? What freed the relevant value?

Nor do I really understand why v10 wouldn't be affected if this actually
is a problem. The relevant code is also only guarded by
        DatumGetPointer(newVal) != DatumGetPointer(pergroupstate->transValue))


> 
> Backtrace from v12 (note, newValue and oldValue are differ on current call,
> but oldValue points into pfreed memory) :
> #0  0x0000000000c8405a in GetMemoryChunkContext (pointer=0x80a808250) at
> ../../../../src/include/utils/memutils.h:130
> 130             AssertArg(MemoryContextIsValid(context));
> (gdb) bt
> #0  0x0000000000c8405a in GetMemoryChunkContext (pointer=0x80a808250) at
> ../../../../src/include/utils/memutils.h:130
> #1  0x0000000000c85ae5 in pfree (pointer=0x80a808250) at mcxt.c:1058
> #2  0x000000000080475e in ExecAggTransReparent (aggstate=0x80a806370,
> pertrans=0x80a87e830, newValue=34535940744, newValueIsNull=false,
> oldValue=34535932496, oldValueIsNull=false)
>     at execExprInterp.c:4209
> #3  0x00000000007ff51f in ExecInterpExpr (state=0x80a87f4d8,
> econtext=0x80a8065a8, isnull=0x7fffffffd7b7) at execExprInterp.c:1747
> #4  0x000000000082c12b in ExecEvalExprSwitchContext (state=0x80a87f4d8,
> econtext=0x80a8065a8, isNull=0x7fffffffd7b7) at
> ../../../src/include/executor/executor.h:308
> #5  0x000000000082bc0f in advance_aggregates (aggstate=0x80a806370) at nodeAgg.c:679
> #6  0x000000000082b8a6 in agg_retrieve_direct (aggstate=0x80a806370) at
> nodeAgg.c:1847
> #7  0x0000000000828782 in ExecAgg (pstate=0x80a806370) at nodeAgg.c:1572
> #8  0x000000000080e712 in ExecProcNode (node=0x80a806370) at
> ../../../src/include/executor/executor.h:240



> How to reproduce:
> http://sigaev.ru/misc/xdump.sql.bz2
> bzcat xdump.sql.bz2 | psql postgres && psql postgres < x.sql

It should be possible to create a smaller reproducer... It'd be good if
a bug fix for this were committed with a regression test.


> diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c
> index 034970648f3..3b5333716d4 100644
> --- a/src/backend/executor/execExprInterp.c
> +++ b/src/backend/executor/execExprInterp.c
> @@ -1743,7 +1743,8 @@ ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull)
>               * expanded object that is already a child of the aggcontext,
>               * assume we can adopt that value without copying it.
>               */
> -            if (DatumGetPointer(newVal) != DatumGetPointer(pergroup->transValue))
> +            if (DatumGetPointer(newVal) != DatumGetPointer(pergroup->transValue) ||
> +                fcinfo->isnull != pergroup->transValueIsNull)
>                  newVal = ExecAggTransReparent(aggstate, pertrans,
>                                                newVal, fcinfo->isnull,
>                                                pergroup->transValue,

I'd really like to avoid adding additional branches to these paths.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Greatest Common Divisor
Next
From: Stephen Frost
Date:
Subject: Re: backup manifests