Re: calling procedures is slow and consumes extra much memory against calling function - Mailing list pgsql-hackers

From Tom Lane
Subject Re: calling procedures is slow and consumes extra much memory against calling function
Date
Msg-id 193996.1601255046@sss.pgh.pa.us
Whole thread Raw
In response to Re: calling procedures is slow and consumes extra much memory against calling function  (Pavel Stehule <pavel.stehule@gmail.com>)
Responses Re: calling procedures is slow and consumes extra much memory against calling function  (Pavel Stehule <pavel.stehule@gmail.com>)
List pgsql-hackers
Pavel Stehule <pavel.stehule@gmail.com> writes:
> I am sending another patch that tries to allow CachedPlans for CALL
> statements. I think this patch is very accurate, but it is not nice,
> because it is smudging very precious reference counting for CachedPlans.

I spent some time testing this.  Although the #1 patch gets rid of
the major memory leak of cached plans, the original test case still
shows a pretty substantial leak across repeated executions of a CALL.
The reason is that the stanza for rebuilding stmt->target also gets
executed each time through, and that leaks not only the relatively
small PLpgSQL_row datum but also a bunch of catalog lookup cruft
created on the way to building the datum.  Basically this code forgot
that plpgsql's outer execution layer can't assume that it's running
in a short-lived context.

I attach a revised #1 that takes care of that problem, and also
cleans up what seems to me to be pretty sloppy thinking in both
the original code and Pavel's #1 patch: we should be restoring
the previous value of expr->plan, not cavalierly assuming that
it was necessarily NULL.  I didn't care for looking at the plan's
"saved" field to decide what was happening, either.  We really
should have a local flag variable clearly defining which behavior
it is that we're implementing.

With this patch, I see zero memory bloat on Pavel's original example,
even with a much larger repeat count.

I don't like much of anything about plpgsql-stmt_call-fix-2.patch.
It feels confused and poorly documented, possibly because "fragile"
is not a very clear term for whatever property it is you're trying to
attribute to plans.  But in any case, I think it's fixing the problem
in the wrong place.  I think the right way to fix it probably is to
manage a CALL's saved plan the same as every other plpgsql plan,
but arrange for the transient refcount on that plan to be held by a
ResourceOwner that is not a child of any transaction resowner, but
rather belongs to the procedure's execution and will be released on
the way out of the procedure.

In any case, I doubt we'd risk back-patching either the #2 patch
or any other approach to avoiding the repeat planning.  We need a
back-patchable fix that at least tamps down the memory bloat,
and this seems like it'd do.

            regards, tom lane

diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c
index d4a3d58daa..7a2f2dfe91 100644
--- a/src/pl/plpgsql/src/pl_exec.c
+++ b/src/pl/plpgsql/src/pl_exec.c
@@ -2145,40 +2145,60 @@ exec_stmt_perform(PLpgSQL_execstate *estate, PLpgSQL_stmt_perform *stmt)

 /*
  * exec_stmt_call
+ *
+ * NOTE: this is used for both CALL and DO statements.
  */
 static int
 exec_stmt_call(PLpgSQL_execstate *estate, PLpgSQL_stmt_call *stmt)
 {
     PLpgSQL_expr *expr = stmt->expr;
+    SPIPlanPtr    orig_plan = expr->plan;
+    bool        local_plan;
+    PLpgSQL_variable *volatile cur_target = stmt->target;
     volatile LocalTransactionId before_lxid;
     LocalTransactionId after_lxid;
     volatile bool pushed_active_snap = false;
     volatile int rc;

+    /*
+     * If not in atomic context, we make a local plan that we'll just use for
+     * this invocation, and will free at the end.  Otherwise, transaction ends
+     * would cause errors about plancache leaks.
+     *
+     * XXX This would be fixable with some plancache/resowner surgery
+     * elsewhere, but for now we'll just work around this here.
+     */
+    local_plan = !estate->atomic;
+
     /* PG_TRY to ensure we clear the plan link, if needed, on failure */
     PG_TRY();
     {
         SPIPlanPtr    plan = expr->plan;
         ParamListInfo paramLI;

-        if (plan == NULL)
+        /*
+         * Make a plan if we don't have one, or if we need a local one.  Note
+         * that we'll overwrite expr->plan either way; the PG_TRY block will
+         * ensure we undo that on the way out, if the plan is local.
+         */
+        if (plan == NULL || local_plan)
         {
+            /* Don't let SPI save the plan if it's going to be local */
+            exec_prepare_plan(estate, expr, 0, !local_plan);
+            plan = expr->plan;

             /*
-             * Don't save the plan if not in atomic context.  Otherwise,
-             * transaction ends would cause errors about plancache leaks.
-             *
-             * XXX This would be fixable with some plancache/resowner surgery
-             * elsewhere, but for now we'll just work around this here.
+             * A CALL should never be a simple expression.  (If it could be,
+             * we'd have to worry about saving/restoring the previous values
+             * of the related expr fields, not just expr->plan.)
              */
-            exec_prepare_plan(estate, expr, 0, estate->atomic);
+            Assert(!expr->expr_simple_expr);

             /*
              * The procedure call could end transactions, which would upset
              * the snapshot management in SPI_execute*, so don't let it do it.
              * Instead, we set the snapshots ourselves below.
              */
-            plan = expr->plan;
             plan->no_snapshots = true;

             /*
@@ -2186,14 +2206,20 @@ exec_stmt_call(PLpgSQL_execstate *estate, PLpgSQL_stmt_call *stmt)
              * case the procedure's argument list has changed.
              */
             stmt->target = NULL;
+            cur_target = NULL;
         }

         /*
          * We construct a DTYPE_ROW datum representing the plpgsql variables
          * associated with the procedure's output arguments.  Then we can use
-         * exec_move_row() to do the assignments.
+         * exec_move_row() to do the assignments.  If we're using a local
+         * plan, also make a local target; otherwise, since the above code
+         * will force a new plan each time through, we'd repeatedly leak the
+         * memory for the target.  (Note: we also leak the target when a plan
+         * change is forced, but that isn't so likely to cause excessive
+         * memory leaks.)
          */
-        if (stmt->is_call && stmt->target == NULL)
+        if (stmt->is_call && cur_target == NULL)
         {
             Node       *node;
             FuncExpr   *funcexpr;
@@ -2208,6 +2234,9 @@ exec_stmt_call(PLpgSQL_execstate *estate, PLpgSQL_stmt_call *stmt)
             int            i;
             ListCell   *lc;

+            /* Use eval_mcontext for any cruft accumulated here */
+            oldcontext = MemoryContextSwitchTo(get_eval_mcontext(estate));
+
             /*
              * Get the parsed CallStmt, and look up the called procedure
              */
@@ -2239,9 +2268,11 @@ exec_stmt_call(PLpgSQL_execstate *estate, PLpgSQL_stmt_call *stmt)
             ReleaseSysCache(func_tuple);

             /*
-             * Begin constructing row Datum
+             * Begin constructing row Datum; keep it in fn_cxt if it's to be
+             * long-lived.
              */
-            oldcontext = MemoryContextSwitchTo(estate->func->fn_cxt);
+            if (!local_plan)
+                MemoryContextSwitchTo(estate->func->fn_cxt);

             row = (PLpgSQL_row *) palloc0(sizeof(PLpgSQL_row));
             row->dtype = PLPGSQL_DTYPE_ROW;
@@ -2249,7 +2280,8 @@ exec_stmt_call(PLpgSQL_execstate *estate, PLpgSQL_stmt_call *stmt)
             row->lineno = -1;
             row->varnos = (int *) palloc(sizeof(int) * list_length(funcargs));

-            MemoryContextSwitchTo(oldcontext);
+            if (!local_plan)
+                MemoryContextSwitchTo(get_eval_mcontext(estate));

             /*
              * Examine procedure's argument list.  Each output arg position
@@ -2293,7 +2325,13 @@ exec_stmt_call(PLpgSQL_execstate *estate, PLpgSQL_stmt_call *stmt)

             row->nfields = nfields;

-            stmt->target = (PLpgSQL_variable *) row;
+            cur_target = (PLpgSQL_variable *) row;
+
+            /* We can save and re-use the target datum, if it's not local */
+            if (!local_plan)
+                stmt->target = cur_target;
+
+            MemoryContextSwitchTo(oldcontext);
         }

         paramLI = setup_param_list(estate, expr);
@@ -2316,17 +2354,27 @@ exec_stmt_call(PLpgSQL_execstate *estate, PLpgSQL_stmt_call *stmt)
     PG_CATCH();
     {
         /*
-         * If we aren't saving the plan, unset the pointer.  Note that it
-         * could have been unset already, in case of a recursive call.
+         * If we are using a local plan, restore the old plan link.
          */
-        if (expr->plan && !expr->plan->saved)
-            expr->plan = NULL;
+        if (local_plan)
+            expr->plan = orig_plan;
         PG_RE_THROW();
     }
     PG_END_TRY();

-    if (expr->plan && !expr->plan->saved)
-        expr->plan = NULL;
+    /*
+     * If we are using a local plan, restore the old plan link; then free the
+     * local plan to avoid memory leaks.  (Note that the error exit path above
+     * just clears the link without risking calling SPI_freeplan; we expect
+     * that xact cleanup will take care of the mess in that case.)
+     */
+    if (local_plan)
+    {
+        SPIPlanPtr    plan = expr->plan;
+
+        expr->plan = orig_plan;
+        SPI_freeplan(plan);
+    }

     if (rc < 0)
         elog(ERROR, "SPI_execute_plan_with_paramlist failed executing query \"%s\": %s",
@@ -2363,10 +2411,10 @@ exec_stmt_call(PLpgSQL_execstate *estate, PLpgSQL_stmt_call *stmt)
     {
         SPITupleTable *tuptab = SPI_tuptable;

-        if (!stmt->target)
+        if (!cur_target)
             elog(ERROR, "DO statement returned a row");

-        exec_move_row(estate, stmt->target, tuptab->vals[0], tuptab->tupdesc);
+        exec_move_row(estate, cur_target, tuptab->vals[0], tuptab->tupdesc);
     }
     else if (SPI_processed > 1)
         elog(ERROR, "procedure call returned more than one row");

pgsql-hackers by date:

Previous
From: Andy Fan
Date:
Subject: Re: Partition prune with stable Expr
Next
From: Tom Lane
Date:
Subject: Re: Partition prune with stable Expr