Re: gs_group_1 crashing on 13beta2/s390x - Mailing list pgsql-hackers

From Ranier Vilela
Subject Re: gs_group_1 crashing on 13beta2/s390x
Date
Msg-id CAEudQArWoTTCXy0q7E-QtevPA0cW0cebZOOunyEJ2GbzoVDf0A@mail.gmail.com
Whole thread Raw
In response to Re: gs_group_1 crashing on 13beta2/s390x  (Ranier Vilela <ranier.vf@gmail.com>)
List pgsql-hackers
Em sex., 25 de set. de 2020 às 14:36, Ranier Vilela <ranier.vf@gmail.com> escreveu:
Em sex., 25 de set. de 2020 às 11:30, Christoph Berg <myon@debian.org> escreveu:
Re: Tom Lane
> >  Tom> It's hardly surprising that datumCopy would segfault when given a
> >  Tom> null "value" and told it is pass-by-reference. However, to get to
> >  Tom> the datumCopy call, we must have passed the MemoryContextContains
> >  Tom> check on that very same pointer value, and that would surely have
> >  Tom> segfaulted as well, one would think.
>
> > Nope, because MemoryContextContains just returns "false" if passed a
> > NULL pointer.
>
> Ah, right.  So you could imagine getting here if the finalfn had returned
> PointerGetDatum(NULL) with isnull = false.  We have some aggregate
> transfns that are capable of doing that for internal-type transvalues,
> I think, but the finalfn never should do it.

So I had another stab at this. As expected, the 13.0 upload to
Debian/unstable crashed again on the buildd, while a manual
everything-should-be-the-same build succeeded. I don't know why I
didn't try this before, but this time I took this manual build and
started a PG instance from it. Pasting the gs_group_1 queries made it
segfault instantly.

So here we are:

#0  datumCopy (value=0, typLen=-1, typByVal=false) at ./build/../src/backend/utils/adt/datum.c:142
#1  0x000002aa3bf6322e in datumCopy (value=<optimized out>, typByVal=<optimized out>, typLen=<optimized out>)
    at ./build/../src/backend/utils/adt/datum.c:178
#2  0x000002aa3bda6dd6 in finalize_aggregate (aggstate=aggstate@entry=0x2aa3defbfd0, peragg=peragg@entry=0x2aa3e0671f0,
    pergroupstate=pergroupstate@entry=0x2aa3e026b78, resultVal=resultVal@entry=0x2aa3e067108, resultIsNull=0x2aa3e06712a)
    at ./build/../src/backend/executor/nodeAgg.c:1153

(gdb) p *resultVal
$3 = 0
(gdb) p *resultIsNull
$6 = false

(gdb) p *peragg
$7 = {aggref = 0x2aa3deef218, transno = 2, finalfn_oid = 0, finalfn = {fn_addr = 0x0, fn_oid = 0, fn_nargs = 0, fn_strict = false,
    fn_retset = false, fn_stats = 0 '\000', fn_extra = 0x0, fn_mcxt = 0x0, fn_expr = 0x0}, numFinalArgs = 1, aggdirectargs = 0x0,
  resulttypeLen = -1, resulttypeByVal = false, shareable = true}

Since finalfn_oid is 0, resultVal/resultIsNull were set by the `else`
branch of the if (OidIsValid) in finalize_aggregate():

    else
    {
        /* Don't need MakeExpandedObjectReadOnly; datumCopy will copy it */
        *resultVal = pergroupstate->transValue;
        *resultIsNull = pergroupstate->transValueIsNull;
    }

(gdb) p *pergroupstate
$12 = {transValue = 0, transValueIsNull = false, noTransValue = false}
Here transValueIsNull shouldn't be "true"?
thus, DatumCopy would be protected, for this test: "!*resultIsNull"
Observe this excerpt (line 1129):
/* don't call a strict function with NULL inputs */
*resultVal = (Datum) 0;
*resultIsNull = true;

Now, it does not contradict this principle.
If all the values are null, they should be filled with True (1),
and not 0 (false)?

Line (4711), function ExecReScanAgg:
MemSet(econtext->ecxt_aggvalues, 0, sizeof(Datum) * node->numaggs);
MemSet(econtext->ecxt_aggnulls, true, sizeof(bool) * node->numaggs);

zero, here, mean False, aggvalues is Null? Not.

regards,
Ranier Vilela

pgsql-hackers by date:

Previous
From: Grigory Smolkin
Date:
Subject: Re: history file on replica and double switchover
Next
From: Tom Lane
Date:
Subject: Re: gs_group_1 crashing on 13beta2/s390x