Re: BUG #17113: Assert failed on calling a function fixed after an extension reload - Mailing list pgsql-bugs

From Tom Lane
Subject Re: BUG #17113: Assert failed on calling a function fixed after an extension reload
Date
Msg-id 333098.1626736627@sss.pgh.pa.us
Whole thread Raw
In response to Re: BUG #17113: Assert failed on calling a function fixed after an extension reload  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-bugs
I wrote:
> PG Bug reporting form <noreply@postgresql.org> writes:
>> The following SQL snippet:
>> ...
>> leads to a failed assertion with the following stack:

> Hmm ... it seems fine in v14 and HEAD, but I do see the crash in v13.

So actually this is an uninitialized-variable problem, which makes
it mostly luck whether it fails or not.  (Turning on
RANDOMIZE_ALLOCATED_MEMORY would make it much more reproducible,
though, and I imagine valgrind would complain as well.)

The core of the issue is that this coding pattern in exec_stmt_execsql
is unsafe:

    /*
     * On the first call for this statement generate the plan, and detect
     * whether the statement is INSERT/UPDATE/DELETE
     */
    if (expr->plan == NULL)
    {
        exec_prepare_plan(estate, expr, CURSOR_OPT_PARALLEL_OK, true);
        // compute stmt->mod_stmt here
    }

It's possible for exec_prepare_plan to throw an error after setting
expr->plan, in which case if we come here again then expr->plan is
already set so we'll never compute stmt->mod_stmt at all.

Since the above is a fairly tempting coding pattern, I tried to make
it safe by rearranging things so that exec_prepare_plan wouldn't set
expr->plan until the end.  That idea crashed and burned though; the
plan refcount manipulations we do don't work correctly if we do it
like that, and trying to change that looks like it'd create fairly
substantial risks of leaking plan refcounts on error.

So I ended up with the attached, which just adds some warning comments
saying "don't do it like that" and then fixes the two places that did
do it like that (exec_stmt_call is broken too).

This requires changing struct PLpgSQL_stmt_execsql, which is something
of an ABI hazard for pldebugger and the like.  I think we can make it
safe in the back branches by putting the new mod_stmt_set bool after
the other three bools, though that's rather an unintuitive ordering.

For me, the new test case fails in v10..HEAD but not 9.6; but I think
that's just random chance.  The bug is really quite old.

Barring objections, I'll push this tomorrow.

(Memo to self: the back branches have more variants of the same issue,
so their patches are going to look different.)

            regards, tom lane

diff --git a/src/pl/plpgsql/src/expected/plpgsql_simple.out b/src/pl/plpgsql/src/expected/plpgsql_simple.out
index 5a2fefa03e..e1f5d670fb 100644
--- a/src/pl/plpgsql/src/expected/plpgsql_simple.out
+++ b/src/pl/plpgsql/src/expected/plpgsql_simple.out
@@ -66,3 +66,26 @@ select simplecaller();
           555
 (1 row)

+-- Check case where first attempt to determine if it's simple fails
+create function simplesql() returns int language sql
+as $$select 1 / 0$$;
+create or replace function simplecaller() returns int language plpgsql
+as $$
+declare x int;
+begin
+  select simplesql() into x;
+  return x;
+end$$;
+select simplecaller();  -- division by zero occurs during simple-expr check
+ERROR:  division by zero
+CONTEXT:  SQL function "simplesql" during inlining
+SQL statement "select simplesql()"
+PL/pgSQL function simplecaller() line 4 at SQL statement
+create or replace function simplesql() returns int language sql
+as $$select 2 + 2$$;
+select simplecaller();
+ simplecaller
+--------------
+            4
+(1 row)
+
diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c
index b5c208d83d..14bbe12da5 100644
--- a/src/pl/plpgsql/src/pl_exec.c
+++ b/src/pl/plpgsql/src/pl_exec.c
@@ -2162,22 +2162,20 @@ exec_stmt_call(PLpgSQL_execstate *estate, PLpgSQL_stmt_call *stmt)
      * Make a plan if we don't have one already.
      */
     if (expr->plan == NULL)
-    {
         exec_prepare_plan(estate, expr, 0);

-        /*
-         * A CALL or DO can never be a simple expression.
-         */
-        Assert(!expr->expr_simple_expr);
+    /*
+     * A CALL or DO can never be a simple expression.
+     */
+    Assert(!expr->expr_simple_expr);

-        /*
-         * Also construct a DTYPE_ROW datum representing the plpgsql variables
-         * associated with the procedure's output arguments.  Then we can use
-         * exec_move_row() to do the assignments.
-         */
-        if (stmt->is_call)
-            stmt->target = make_callstmt_target(estate, expr);
-    }
+    /*
+     * Also construct a DTYPE_ROW datum representing the plpgsql variables
+     * associated with the procedure's output arguments.  Then we can use
+     * exec_move_row() to do the assignments.
+     */
+    if (stmt->is_call && stmt->target == NULL)
+        stmt->target = make_callstmt_target(estate, expr);

     paramLI = setup_param_list(estate, expr);

@@ -4093,6 +4091,22 @@ exec_eval_cleanup(PLpgSQL_execstate *estate)

 /* ----------
  * Generate a prepared plan
+ *
+ * CAUTION: it is possible for this function to throw an error after it has
+ * built a SPIPlan and saved it in expr->plan.  Therefore, be wary of doing
+ * additional things contingent on expr->plan being NULL.  That is, given
+ * code like
+ *
+ *    if (query->plan == NULL)
+ *    {
+ *        // okay to put setup code here
+ *        exec_prepare_plan(estate, query, ...);
+ *        // NOT okay to put more logic here
+ *    }
+ *
+ * extra steps at the end are unsafe because they will not be executed when
+ * re-executing the calling statement, if exec_prepare_plan failed the first
+ * time.  This is annoyingly error-prone, but the alternatives are worse.
  * ----------
  */
 static void
@@ -4156,10 +4170,12 @@ exec_stmt_execsql(PLpgSQL_execstate *estate,
      * whether the statement is INSERT/UPDATE/DELETE
      */
     if (expr->plan == NULL)
+        exec_prepare_plan(estate, expr, CURSOR_OPT_PARALLEL_OK);
+
+    if (!stmt->mod_stmt_set)
     {
         ListCell   *l;

-        exec_prepare_plan(estate, expr, CURSOR_OPT_PARALLEL_OK);
         stmt->mod_stmt = false;
         foreach(l, SPI_plan_get_plan_sources(expr->plan))
         {
@@ -4179,6 +4195,7 @@ exec_stmt_execsql(PLpgSQL_execstate *estate,
                 break;
             }
         }
+        stmt->mod_stmt_set = true;
     }

     /*
@@ -7846,6 +7863,10 @@ get_cast_hashentry(PLpgSQL_execstate *estate,
  * exec_simple_check_plan -        Check if a plan is simple enough to
  *                                be evaluated by ExecEvalExpr() instead
  *                                of SPI.
+ *
+ * Note: the refcount manipulations in this function assume that expr->plan
+ * is a "saved" SPI plan.  That's a bit annoying from the caller's standpoint,
+ * but it's otherwise difficult to avoid leaking the plan on failure.
  * ----------
  */
 static void
@@ -7929,7 +7950,8 @@ exec_simple_check_plan(PLpgSQL_execstate *estate, PLpgSQL_expr *expr)
      * OK, we can treat it as a simple plan.
      *
      * Get the generic plan for the query.  If replanning is needed, do that
-     * work in the eval_mcontext.
+     * work in the eval_mcontext.  (Note that replanning could throw an error,
+     * in which case the expr is left marked "not simple", which is fine.)
      */
     oldcontext = MemoryContextSwitchTo(get_eval_mcontext(estate));
     cplan = SPI_plan_get_cached_plan(expr->plan);
diff --git a/src/pl/plpgsql/src/pl_gram.y b/src/pl/plpgsql/src/pl_gram.y
index 3fcca43b90..0f6a5b30b1 100644
--- a/src/pl/plpgsql/src/pl_gram.y
+++ b/src/pl/plpgsql/src/pl_gram.y
@@ -3043,7 +3043,7 @@ make_execsql_stmt(int firsttoken, int location)

     check_sql_expr(expr->query, expr->parseMode, location);

-    execsql = palloc(sizeof(PLpgSQL_stmt_execsql));
+    execsql = palloc0(sizeof(PLpgSQL_stmt_execsql));
     execsql->cmd_type = PLPGSQL_STMT_EXECSQL;
     execsql->lineno  = plpgsql_location_to_lineno(location);
     execsql->stmtid  = ++plpgsql_curr_compile->nstatements;
diff --git a/src/pl/plpgsql/src/plpgsql.h b/src/pl/plpgsql/src/plpgsql.h
index fcbfcd678b..ebd3a5d3c8 100644
--- a/src/pl/plpgsql/src/plpgsql.h
+++ b/src/pl/plpgsql/src/plpgsql.h
@@ -893,8 +893,8 @@ typedef struct PLpgSQL_stmt_execsql
     int            lineno;
     unsigned int stmtid;
     PLpgSQL_expr *sqlstmt;
-    bool        mod_stmt;        /* is the stmt INSERT/UPDATE/DELETE?  Note:
-                                 * mod_stmt is set when we plan the query */
+    bool        mod_stmt;        /* is the stmt INSERT/UPDATE/DELETE? */
+    bool        mod_stmt_set;    /* is mod_stmt valid yet? */
     bool        into;            /* INTO supplied? */
     bool        strict;            /* INTO STRICT flag */
     PLpgSQL_variable *target;    /* INTO target (record or row) */
diff --git a/src/pl/plpgsql/src/sql/plpgsql_simple.sql b/src/pl/plpgsql/src/sql/plpgsql_simple.sql
index 8a95768073..57020d22d6 100644
--- a/src/pl/plpgsql/src/sql/plpgsql_simple.sql
+++ b/src/pl/plpgsql/src/sql/plpgsql_simple.sql
@@ -59,3 +59,24 @@ select simplecaller();
 \c -

 select simplecaller();
+
+
+-- Check case where first attempt to determine if it's simple fails
+
+create function simplesql() returns int language sql
+as $$select 1 / 0$$;
+
+create or replace function simplecaller() returns int language plpgsql
+as $$
+declare x int;
+begin
+  select simplesql() into x;
+  return x;
+end$$;
+
+select simplecaller();  -- division by zero occurs during simple-expr check
+
+create or replace function simplesql() returns int language sql
+as $$select 2 + 2$$;
+
+select simplecaller();

pgsql-bugs by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: BUG #17103: WAL segments are not removed after exceeding max_slot_wal_keep_size
Next
From: Kyotaro Horiguchi
Date:
Subject: Re: BUG #17103: WAL segments are not removed after exceeding max_slot_wal_keep_size