Thread: SQL functions: avoid making a tuplestore unnecessarily

SQL functions: avoid making a tuplestore unnecessarily

From

Tom Lane

Date:

17 April, 22:59:28

(I'm not proposing this for v18, but I wanted to get the patch
written while functions.c is still fresh in mind.)

The attached patch changes functions.c so that we only make a
tuplestore object if we're actually going to return multiple
rows in it, that is if it's a set-returning function and we
cannot use "lazyEval" mode.  Otherwise, we can just rely on
the result slot we made anyway to hold the one tuple we need.

This saves a small number of cycles by not shoving a tuple into the
tuplestore only to pull it right back out.  But the real reason for
changing it is not speed but resource-management worries.  The
existing code (as it was before 0dca5d68d and is again as of
0313c5dc6) makes the tuplestore in the potentially long-lived fn_mcxt.
This'd be merely a slight annoyance if a tuplestore were purely a
memory object, but it isn't: it might contain an open file.  If it
does, that file reference will be backed by a ResourceOwner,
specifically whichever ResourceOwner was CurrentResourceOwner at the
time of creating the tuplestore.  What I am afraid of is that I don't
think there's any guarantee that that ResourceOwner is as long-lived
as the FmgrInfo.  It should be fine within normal SQL query execution,
where the FmgrInfo will be part of the executor's state tree.  But
there are long-lived FmgrInfos, such as those within the typcache, or
within an index's relcache entry.  What if somebody tries to use a
SQL function to implement functionality that's reached through those
mechanisms?

Given the lack of field complaints over the many years it's been
like this, there doesn't seem to be a live problem.  I think the
explanation for that is
(1) those mechanisms are never used to call set-returning functions,
(2) therefore, the tuplestore will not be called on to hold more
than one result row,
(3) therefore, it won't get large enough that it wants to spill
to disk,
(4) therefore, its potentially dangling resowner pointer is never
used.
However, this is an uncomfortably shaky chain of assumptions.
I want to cut it down by fixing things so that there is no
tuplestore, period, in a non-set-returning SQL function.

Furthermore, this patch changes things so that when we do make a
tuplestore, it lives in the per-query context not the fn_mcxt,
providing additional surety that it won't outlive its ResourceManager.
This is a change I made in 0dca5d68d and then had to undo for
performance reasons, but because of these resource-lifetime
considerations I really want it back that way again.

Since I think there is probably not a reachable bug here today,
I'm going to park this for v19.

            regards, tom lane

diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index e0bca7cb81c..8d4d062d579 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -44,7 +44,7 @@
 typedef struct
 {
     DestReceiver pub;            /* publicly-known function pointers */
-    Tuplestorestate *tstore;    /* where to put result tuples */
+    Tuplestorestate *tstore;    /* where to put result tuples, or NULL */
     JunkFilter *filter;            /* filter to convert tuple type */
 } DR_sqlfunction;

@@ -145,11 +145,13 @@ typedef struct SQLFunctionCache
     bool        lazyEvalOK;        /* true if lazyEval is safe */
     bool        shutdown_reg;    /* true if registered shutdown callback */
     bool        lazyEval;        /* true if using lazyEval for result query */
+    bool        randomAccess;    /* true if tstore needs random access */
     bool        ownSubcontext;    /* is subcontext really a separate context? */

     ParamListInfo paramLI;        /* Param list representing current args */

-    Tuplestorestate *tstore;    /* where we accumulate result tuples */
+    Tuplestorestate *tstore;    /* where we accumulate result for a SRF */
+    MemoryContext tscontext;    /* memory context that tstore should be in */

     JunkFilter *junkFilter;        /* will be NULL if function returns VOID */
     int            jf_generation;    /* tracks whether junkFilter is up-to-date */
@@ -1250,7 +1252,7 @@ static void
 postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
 {
     DestReceiver *dest;
-    MemoryContext oldcontext;
+    MemoryContext oldcontext = CurrentMemoryContext;

     Assert(es->qd == NULL);

@@ -1296,12 +1298,27 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
         fcache->ownSubcontext = false;
     }

+    /*
+     * Build a tuplestore if needed, that is if it's a set-returning function
+     * and we're producing the function result without using lazyEval mode.
+     */
+    if (es->setsResult)
+    {
+        Assert(fcache->tstore == NULL);
+        if (fcache->func->returnsSet && !es->lazyEval)
+        {
+            MemoryContextSwitchTo(fcache->tscontext);
+            fcache->tstore = tuplestore_begin_heap(fcache->randomAccess,
+                                                   false, work_mem);
+        }
+    }
+
     /* Switch into the selected subcontext (might be a no-op) */
-    oldcontext = MemoryContextSwitchTo(fcache->subcontext);
+    MemoryContextSwitchTo(fcache->subcontext);

     /*
-     * If this query produces the function result, send its output to the
-     * tuplestore; else discard any output.
+     * If this query produces the function result, collect its output using
+     * our custom DestReceiver; else discard any output.
      */
     if (es->setsResult)
     {
@@ -1311,8 +1328,11 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
         /* pass down the needed info to the dest receiver routines */
         myState = (DR_sqlfunction *) dest;
         Assert(myState->pub.mydest == DestSQLFunction);
-        myState->tstore = fcache->tstore;
+        myState->tstore = fcache->tstore;    /* might be NULL */
         myState->filter = fcache->junkFilter;
+
+        /* Make very sure the junkfilter's result slot is empty */
+        ExecClearTuple(fcache->junkFilter->jf_resultSlot);
     }
     else
         dest = None_Receiver;
@@ -1500,8 +1520,8 @@ postquel_get_single_result(TupleTableSlot *slot,
     /*
      * Set up to return the function value.  For pass-by-reference datatypes,
      * be sure to copy the result into the current context.  We can't leave
-     * the data in the TupleTableSlot because we intend to clear the slot
-     * before returning.
+     * the data in the TupleTableSlot because we must clear the slot before
+     * returning.
      */
     if (fcache->func->returnsTuple)
     {
@@ -1521,6 +1541,9 @@ postquel_get_single_result(TupleTableSlot *slot,
             value = datumCopy(value, fcache->func->typbyval, fcache->func->typlen);
     }

+    /* Clear the slot for next time */
+    ExecClearTuple(slot);
+
     return value;
 }

@@ -1532,6 +1555,7 @@ fmgr_sql(PG_FUNCTION_ARGS)
 {
     SQLFunctionCachePtr fcache;
     ErrorContextCallback sqlerrcontext;
+    MemoryContext tscontext;
     bool        randomAccess;
     bool        lazyEvalOK;
     bool        pushed_snapshot;
@@ -1558,11 +1582,15 @@ fmgr_sql(PG_FUNCTION_ARGS)
                      errmsg("set-valued function called in context that cannot accept a set")));
         randomAccess = rsi->allowedModes & SFRM_Materialize_Random;
         lazyEvalOK = !(rsi->allowedModes & SFRM_Materialize_Preferred);
+        /* tuplestore, if used, must have query lifespan */
+        tscontext = rsi->econtext->ecxt_per_query_memory;
     }
     else
     {
         randomAccess = false;
         lazyEvalOK = true;
+        /* we won't need a tuplestore */
+        tscontext = NULL;
     }

     /*
@@ -1570,6 +1598,10 @@ fmgr_sql(PG_FUNCTION_ARGS)
      */
     fcache = init_sql_fcache(fcinfo, lazyEvalOK);

+    /* Remember info that we might need later to construct tuplestore */
+    fcache->tscontext = tscontext;
+    fcache->randomAccess = randomAccess;
+
     /*
      * Now we can set up error traceback support for ereport()
      */
@@ -1578,20 +1610,6 @@ fmgr_sql(PG_FUNCTION_ARGS)
     sqlerrcontext.previous = error_context_stack;
     error_context_stack = &sqlerrcontext;

-    /*
-     * Build tuplestore to hold results, if we don't have one already.  We
-     * want to re-use the tuplestore across calls, so it needs to live in
-     * fcontext.
-     */
-    if (!fcache->tstore)
-    {
-        MemoryContext oldcontext;
-
-        oldcontext = MemoryContextSwitchTo(fcache->fcontext);
-        fcache->tstore = tuplestore_begin_heap(randomAccess, false, work_mem);
-        MemoryContextSwitchTo(oldcontext);
-    }
-
     /*
      * Find first unfinished execution_state.  If none, advance to the next
      * query in function.
@@ -1661,11 +1679,12 @@ fmgr_sql(PG_FUNCTION_ARGS)

         /*
          * If we ran the command to completion, we can shut it down now. Any
-         * row(s) we need to return are safely stashed in the tuplestore, and
-         * we want to be sure that, for example, AFTER triggers get fired
-         * before we return anything.  Also, if the function doesn't return
-         * set, we can shut it down anyway because it must be a SELECT and we
-         * don't care about fetching any more result rows.
+         * row(s) we need to return are safely stashed in the result slot or
+         * tuplestore, and we want to be sure that, for example, AFTER
+         * triggers get fired before we return anything.  Also, if the
+         * function doesn't return set, we can shut it down anyway because it
+         * must be a SELECT and we don't care about fetching any more result
+         * rows.
          */
         if (completed || !fcache->func->returnsSet)
             postquel_end(es, fcache);
@@ -1708,7 +1727,8 @@ fmgr_sql(PG_FUNCTION_ARGS)
     }

     /*
-     * The tuplestore now contains whatever row(s) we are supposed to return.
+     * The result slot or tuplestore now contains whatever row(s) we are
+     * supposed to return.
      */
     if (fcache->func->returnsSet)
     {
@@ -1721,16 +1741,12 @@ fmgr_sql(PG_FUNCTION_ARGS)
              * row.
              */
             Assert(es->lazyEval);
-            /* Re-use the junkfilter's output slot to fetch back the tuple */
+            /* The junkfilter's result slot contains the query result tuple */
             Assert(fcache->junkFilter);
             slot = fcache->junkFilter->jf_resultSlot;
-            if (!tuplestore_gettupleslot(fcache->tstore, true, false, slot))
-                elog(ERROR, "failed to fetch lazy-eval tuple");
+            Assert(!TTS_EMPTY(slot));
             /* Extract the result as a datum, and copy out from the slot */
             result = postquel_get_single_result(slot, fcinfo, fcache);
-            /* Clear the tuplestore, but keep it for next time */
-            /* NB: this might delete the slot's content, but we don't care */
-            tuplestore_clear(fcache->tstore);

             /*
              * Let caller know we're not finished.
@@ -1752,12 +1768,8 @@ fmgr_sql(PG_FUNCTION_ARGS)
         else if (fcache->lazyEval)
         {
             /*
-             * We are done with a lazy evaluation.  Clean up.
-             */
-            tuplestore_clear(fcache->tstore);
-
-            /*
-             * Let caller know we're finished.
+             * We are done with a lazy evaluation.  Let caller know we're
+             * finished.
              */
             rsi->isDone = ExprEndResult;

@@ -1779,7 +1791,12 @@ fmgr_sql(PG_FUNCTION_ARGS)
              * We are done with a non-lazy evaluation.  Return whatever is in
              * the tuplestore.  (It is now caller's responsibility to free the
              * tuplestore when done.)
+             *
+             * Note an edge case: we could get here without having made a
+             * tuplestore if the function is declared to return SETOF VOID.
+             * ExecMakeTableFunctionResult will cope with null setResult.
              */
+            Assert(fcache->tstore || fcache->func->rettype == VOIDOID);
             rsi->returnMode = SFRM_Materialize;
             rsi->setResult = fcache->tstore;
             fcache->tstore = NULL;
@@ -1807,9 +1824,9 @@ fmgr_sql(PG_FUNCTION_ARGS)
          */
         if (fcache->junkFilter)
         {
-            /* Re-use the junkfilter's output slot to fetch back the tuple */
+            /* The junkfilter's result slot contains the query result tuple */
             slot = fcache->junkFilter->jf_resultSlot;
-            if (tuplestore_gettupleslot(fcache->tstore, true, false, slot))
+            if (!TTS_EMPTY(slot))
                 result = postquel_get_single_result(slot, fcinfo, fcache);
             else
             {
@@ -1824,9 +1841,6 @@ fmgr_sql(PG_FUNCTION_ARGS)
             fcinfo->isnull = true;
             result = (Datum) 0;
         }
-
-        /* Clear the tuplestore, but keep it for next time */
-        tuplestore_clear(fcache->tstore);
     }

     /* Pop snapshot if we have pushed one */
@@ -2604,11 +2618,32 @@ sqlfunction_receive(TupleTableSlot *slot, DestReceiver *self)
 {
     DR_sqlfunction *myState = (DR_sqlfunction *) self;

-    /* Filter tuple as needed */
-    slot = ExecFilterJunk(myState->filter, slot);
+    if (myState->tstore)
+    {
+        /* We are collecting all of a set result into the tuplestore */
+
+        /* Filter tuple as needed */
+        slot = ExecFilterJunk(myState->filter, slot);

-    /* Store the filtered tuple into the tuplestore */
-    tuplestore_puttupleslot(myState->tstore, slot);
+        /* Store the filtered tuple into the tuplestore */
+        tuplestore_puttupleslot(myState->tstore, slot);
+    }
+    else
+    {
+        /*
+         * We only want the first tuple, which we'll save in the junkfilter's
+         * result slot.  Ignore any additional tuples passed.
+         */
+        if (TTS_EMPTY(myState->filter->jf_resultSlot))
+        {
+            /* Filter tuple as needed */
+            slot = ExecFilterJunk(myState->filter, slot);
+            Assert(slot == myState->filter->jf_resultSlot);
+
+            /* Materialize the slot so it preserves pass-by-ref values */
+            ExecMaterializeSlot(slot);
+        }
+    }

     return true;
 }

Re: SQL functions: avoid making a tuplestore unnecessarily

From

Tom Lane

Date:

02 May, 21:46:50

I wrote:
> Given the lack of field complaints over the many years it's been
> like this, there doesn't seem to be a live problem.  I think the
> explanation for that is
> (1) those mechanisms are never used to call set-returning functions,
> (2) therefore, the tuplestore will not be called on to hold more
> than one result row,
> (3) therefore, it won't get large enough that it wants to spill
> to disk,
> (4) therefore, its potentially dangling resowner pointer is never
> used.
> However, this is an uncomfortably shaky chain of assumptions.
> I want to cut it down by fixing things so that there is no
> tuplestore, period, in a non-set-returning SQL function.

Following up on this thread: Alexander Lakhin's report at [1] shows
that point (3) above is wrong: the tuplestore code will spill to disk
even when holding just one tuple, if that tuple is bigger than the
tuplestore's allowed work_mem.  (This seems kinda dubious to me, since
no memory savings can ensue.  But I have no desire to rejigger that
right now.)  So there may actually be a live bug associated with
use of a deleted resource owner here.  I've not tried to pursue that
though.

More immediately: Alexander's report also shows that there's an easily
reached bug in HEAD when the tuplestore does spill to disk.  When it
reads that tuple back in, it allocates it in the caller's memory
context, and fmgr_sql is now calling that in the short-lived context
it was called in not in its long-lived fcontext.  The end result of
that is that the long-lived result TupleTableSlot is now holding a
should_free pointer to a short-lived tuple, which ends up in an
attempt to pfree already-wiped storage during the next call of the
SQL function.

The patch I presented here eliminates that problem because with it,
fmgr_sql no longer pulls tuples out of the tuplestore at all.
So I want to apply this patch now instead of holding it for v19.

            regards, tom lane

[1] https://www.postgresql.org/message-id/9f975803-1a1c-4f21-b987-f572e110e860%40gmail.com