Server crash due to assertion failure in CheckOpSlotCompatibility() - Mailing list pgsql-hackers

From Ashutosh Sharma
Subject Server crash due to assertion failure in CheckOpSlotCompatibility()
Date
Msg-id CAE9k0PmNaMD2oHTEAhRyxnxpaDaYkuBYkLa1dpOpn=RS0iS2AQ@mail.gmail.com
Whole thread Raw
Responses Re: Server crash due to assertion failure in CheckOpSlotCompatibility()  (Ashutosh Sharma <ashu.coek88@gmail.com>)
List pgsql-hackers
Hi All,

I'm getting a server crash when executing the following test-case:

create table t1(a int primary key, b text);
insert into t1 values (1, 'aa'), (2, 'bb'), (3, 'aa'), (4, 'bb');
select a, b, array_agg(a order by a) from t1 group by grouping sets ((a), (b));

Backtrace:
#0  0x00007f37d0630277 in raise () from /lib64/libc.so.6
#1  0x00007f37d0631968 in abort () from /lib64/libc.so.6
#2  0x0000000000a5685e in ExceptionalCondition (conditionName=0xc29fd0 "!(op->d.fetch.kind == slot->tts_ops)", errorType=0xc29cc1 "FailedAssertion",
    fileName=0xc29d09 "execExprInterp.c", lineNumber=1905) at assert.c:54
#3  0x00000000006dfa2b in CheckOpSlotCompatibility (op=0x2e84e38, slot=0x2e6e268) at execExprInterp.c:1905
#4  0x00000000006dd446 in ExecInterpExpr (state=0x2e84da0, econtext=0x2e6d8e8, isnull=0x7ffe53cba4af) at execExprInterp.c:439
#5  0x00000000007010e5 in ExecEvalExprSwitchContext (state=0x2e84da0, econtext=0x2e6d8e8, isNull=0x7ffe53cba4af)
    at ../../../src/include/executor/executor.h:307
#6  0x0000000000701be7 in advance_aggregates (aggstate=0x2e6d6b0) at nodeAgg.c:679
#7  0x0000000000703a5d in agg_retrieve_direct (aggstate=0x2e6d6b0) at nodeAgg.c:1847
#8  0x00000000007034da in ExecAgg (pstate=0x2e6d6b0) at nodeAgg.c:1572
#9  0x00000000006e797f in ExecProcNode (node=0x2e6d6b0) at ../../../src/include/executor/executor.h:239
#10 0x00000000006ea174 in ExecutePlan (estate=0x2e6d458, planstate=0x2e6d6b0, use_parallel_mode=false, operation=CMD_SELECT, sendTuples=true,
    numberTuples=0, direction=ForwardScanDirection, dest=0x2e76b30, execute_once=true) at execMain.c:1648
#11 0x00000000006e7f91 in standard_ExecutorRun (queryDesc=0x2e7b3b8, direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:365
#12 0x00000000006e7dc7 in ExecutorRun (queryDesc=0x2e7b3b8, direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:309
#13 0x00000000008e40c7 in PortalRunSelect (portal=0x2e10bc8, forward=true, count=0, dest=0x2e76b30) at pquery.c:929
#14 0x00000000008e3d66 in PortalRun (portal=0x2e10bc8, count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x2e76b30, altdest=0x2e76b30,
    completionTag=0x7ffe53cba850 "") at pquery.c:770

The following Assert statement in CheckOpSlotCompatibility() fails.

1905            Assert(op->d.fetch.kind == slot->tts_ops);

And above assert statement was added by you as a part of the following git commit.

commit 15d8f83128e15de97de61430d0b9569f5ebecc26
Author: Andres Freund <andres@anarazel.de>
Date:   Thu Nov 15 22:00:30 2018 -0800

    Verify that expected slot types match returned slot types.
   
    This is important so JIT compilation knows what kind of tuple slot the
    deforming routine can expect. There's also optimization potential for
    expression initialization without JIT compilation. It e.g. seems
    plausible to elide EEOP_*_FETCHSOME ops entirely when dealing with
    virtual slots.
   
    Author: Andres Freund

Analysis:
I did some quick investigation on this and found that when the aggregate is performed on the first group i.e. group by 'a', all the input tuples are fetched from the outer plan and stored into the tuplesort object and for the subsequent groups i.e. from the second group onwards, the tuples stored in tuplessort object during 1st phase is used. But, then, the tuples stored in the tuplesort object are actually the minimal tuples whereas it is expected to be a heap tuple which actually results into the assertion failure.

I might be wrong, but it seems to me like the slot fetched from tuplesort object needs to be converted to the heap tuple. Actually the following lines of code in agg_retrieve_direct() gets executed only when we have crossed a group boundary. I think, at least the function call to ExecCopySlotHeapTuple(outerslot); followed by ExecForceStoreHeapTuple(); should always happen irrespective of the group boundary limit is crossed or not... Sorry if I'm saying something ...

1871                                             * If we are grouping, check whether we've crossed a group                                               │
   │1872                                             * boundary.                                                                                             │
   │1873                                             */                                                                                                      │
   │1874                                            if (node->aggstrategy != AGG_PLAIN)                                                                      │
   │1875                                            {                                                                                                        │
   │1876                                                    tmpcontext->ecxt_innertuple = firstSlot;                                                         │
   │1877                                                    if (!ExecQual(aggstate->phase->eqfunctions[node->numCols - 1],                                   │
   │1878                                                                              tmpcontext))                                                           │
   │1879                                                    {                                                                                                │
   │1880                                                            aggstate->grp_firstTuple = ExecCopySlotHeapTuple(outerslot);                             │
   │1881                                                            break;                                                                                   │
   │1882                                                    }                                                                                                │
   │1883                                            }

--
With Regards,
Ashutosh Sharma

pgsql-hackers by date:

Previous
From: Haribabu Kommi
Date:
Subject: Re: How to know referenced sub-fields of a composite type?
Next
From: Gilles Darold
Date:
Subject: Doc fix on information_schema.views