Re: BUG #16213: segfault when running a query - Mailing list pgsql-bugs
From | Tom Lane |
---|---|
Subject | Re: BUG #16213: segfault when running a query |
Date | |
Msg-id | 635.1579233826@sss.pgh.pa.us Whole thread Raw |
In response to | BUG #16213: segfault when running a query (PG Bug reporting form <noreply@postgresql.org>) |
List | pgsql-bugs |
PG Bug reporting form <noreply@postgresql.org> writes: > The above query produces an error in the server log: > LOG: server process (PID 108) was terminated by signal 11: Segmentation > fault Yeah, duplicated here. (For anyone following along at home: you can "create domain string as text", or just s/STRING/TEXT/g in the given query. That type's not relevant to the problem.) The problem is probably much more easily reproduced in a debug build, because it boils down to a dangling-pointer bug. I duplicated it back to 9.4, and it's probably older than that. The direct cause of the crash is that by the time we get to ExecutorEnd, there are dangling pointers in the es_tupleTable list. Those pointers turn out to originate from ExecInitSubPlan's creation of TupleTableSlots for the ProjectionInfo objects it creates when doing hashing. And the reason they're dangling is that the subplan is inside a VALUES list, and nodeValuesscan.c does this remarkably bletcherous thing: * Build the expression eval state in the econtext's per-tuple memory. * This is a tad unusual, but we want to delete the eval state again * when we move to the next row, to avoid growth of memory * requirements over a long values list. It turns out that just below that, there's already some messy hacking to deal with subplans in the VALUES list. But I guess we'd not hit the case of a subplan using hashing within VALUES. The attached draft patch fixes this by not letting ExecInitSubPlan hook the slots it's making into the outer es_tupleTable list. Ordinarily that would be bad because it exposes us to leaking resources, if the slots aren't cleared before ending execution. But nodeSubplan.c is already being (mostly) careful to clear those slots promptly, so it doesn't cost us anything much to lose this backstop. What that change fails to do is guarantee that there are no such bugs elsewhere. In the attached I made nodeValuesscan.c assert that nothing has added slots to the es_tupleTable list, but of course that only catches cases where there's a live bug. Given how long this case escaped detection, I don't feel like that's quite enough. (Unsurprisingly, the existing regression tests don't trigger this assert, even without the nodeSubplan.c fix.) Another angle I've not run to ground is that the comments for the existing subplan-related hacking in nodeValuesscan.c claim that a problematic subplan could only occur in conjunction with LATERAL. But there's no LATERAL in this example --- are those comments wrong or obsolete, or just talking about a different case? I didn't work on making a minimal test case for the regression tests, either. Anyway, thanks for the report! regards, tom lane diff --git a/src/backend/executor/nodeSubplan.c b/src/backend/executor/nodeSubplan.c index ff95317..8a4084b 100644 --- a/src/backend/executor/nodeSubplan.c +++ b/src/backend/executor/nodeSubplan.c @@ -133,11 +133,15 @@ ExecHashSubPlan(SubPlanState *node, slot = ExecProject(node->projLeft); /* - * Note: because we are typically called in a per-tuple context, we have - * to explicitly clear the projected tuple before returning. Otherwise, - * we'll have a double-free situation: the per-tuple context will probably - * be reset before we're called again, and then the tuple slot will think - * it still needs to free the tuple. + * Note: we have to explicitly clear the projected tuple before returning. + * This guards against two different hazards. First, because we are + * typically called in a per-tuple context, the data that was stored into + * the slot may be shorter-lived than the slot. If that data is cleared + * before we're called again, but the tuple slot still thinks it needs to + * free the tuple, we'll have a double-free crash. Second, because the + * slot isn't hooked into any estate tuple-table list, there isn't anyone + * else who's going to clear it before executor shutdown. So we risk + * leaking buffer pins or the like if we don't clear it here. */ /* @@ -605,21 +609,17 @@ buildSubPlanHash(SubPlanState *node, ExprContext *econtext) } /* + * As with projLeft, we must manually clear the slot after each use. + */ + ExecClearTuple(slot); + + /* * Reset innerecontext after each inner tuple to free any memory used * during ExecProject. */ ResetExprContext(innerecontext); } - /* - * Since the projected tuples are in the sub-query's context and not the - * main context, we'd better clear the tuple slot before there's any - * chance of a reset of the sub-query's context. Else we will have the - * potential for a double free attempt. (XXX possibly no longer needed, - * but can't hurt.) - */ - ExecClearTuple(node->projRight->pi_state.resultslot); - MemoryContextSwitchTo(oldcontext); } @@ -989,9 +989,15 @@ ExecInitSubPlan(SubPlan *subplan, PlanState *parent) * Fortunately we can just pass NULL for now and fill it in later * (hack alert!). The righthand expressions will be evaluated in our * own innerecontext. + * + * We intentionally don't add the slots to the estate's es_tupleTable. + * That's to support execution of subplans within short-lived + * contexts: these slots might get freed before the estate ends + * execution. Because of this, we have to be careful to explicitly + * clear each slot as soon as we're done with its current value. */ tupDescLeft = ExecTypeFromTL(lefttlist); - slot = ExecInitExtraTupleSlot(estate, tupDescLeft, &TTSOpsVirtual); + slot = MakeTupleTableSlot(tupDescLeft, &TTSOpsVirtual); sstate->projLeft = ExecBuildProjectionInfo(lefttlist, NULL, slot, @@ -999,7 +1005,7 @@ ExecInitSubPlan(SubPlan *subplan, PlanState *parent) NULL); sstate->descRight = tupDescRight = ExecTypeFromTL(righttlist); - slot = ExecInitExtraTupleSlot(estate, tupDescRight, &TTSOpsVirtual); + slot = MakeTupleTableSlot(tupDescRight, &TTSOpsVirtual); sstate->projRight = ExecBuildProjectionInfo(righttlist, sstate->innerecontext, slot, diff --git a/src/backend/executor/nodeValuesscan.c b/src/backend/executor/nodeValuesscan.c index 8d916a0..f104d8c 100644 --- a/src/backend/executor/nodeValuesscan.c +++ b/src/backend/executor/nodeValuesscan.c @@ -94,6 +94,7 @@ ValuesNext(ValuesScanState *node) { MemoryContext oldContext; List *oldsubplans; + List *oldtupletable; List *exprstatelist; Datum *values; bool *isnull; @@ -131,15 +132,38 @@ ValuesNext(ValuesScanState *node) node->ss.ps.subPlan = NIL; /* + * A related problem is that SubPlans might try to create tuple slots + * and attach them to the outer estate's es_tupleTable; that would + * result in dangling pointers in the es_tupleTable after we delete + * the eval state. We can prevent the dangling-pointer problem by + * similarly saving and restoring the es_tupleTable list. This is + * only a partial solution though: any slots thereby excluded from the + * outer estate might result in resource leaks, if they're not cleared + * before ending execution. Therefore, we decree that SubPlans + * mustn't add slots to the outer es_tupleTable. The Assert below + * checks that. It's only an Assert because resource leakage is way + * less problematic than a dangling pointer. + */ + oldtupletable = econtext->ecxt_estate->es_tupleTable; + econtext->ecxt_estate->es_tupleTable = NIL; + + /* * As the expressions are only ever used once, disable JIT for them. * This is worthwhile because it's common to insert significant * amounts of data via VALUES(). */ saved_jit_flags = econtext->ecxt_estate->es_jit_flags; econtext->ecxt_estate->es_jit_flags = PGJIT_NONE; + + /* OK, finally we can init the expressions */ exprstatelist = ExecInitExprList(exprlist, &node->ss.ps); + + /* Undo the hacks above */ econtext->ecxt_estate->es_jit_flags = saved_jit_flags; + Assert(econtext->ecxt_estate->es_tupleTable == NIL); + econtext->ecxt_estate->es_tupleTable = oldtupletable; + node->ss.ps.subPlan = oldsubplans; /* parser should have checked all sublists are the same length */
pgsql-bugs by date: