Thread: BUG #18576: Using EXPLAIN (VERBOSE) in information_schema.element_types returns ERROR: failed to find plan for

The following bug has been logged on the website:

Bug reference:      18576
Logged by:          Vasya B
Email address:      vasiliy.boytsov@gmail.com
PostgreSQL version: 16.3
Operating system:   Ubuntu 24.04
Description:

From a clean DB, one can execute:
  EXPLAIN (VERBOSE) SELECT FROM information_schema.element_types WHERE
object_type = 'TABLE'; 
Which returns:
  ERROR:  failed to find plan for subquery ss
While the expected result was a working query.
W/O VERBOSE this query works.


On Thu, Aug 8, 2024 at 8:23 AM PG Bug reporting form
<noreply@postgresql.org> wrote:
> From a clean DB, one can execute:
>   EXPLAIN (VERBOSE) SELECT FROM information_schema.element_types WHERE
> object_type = 'TABLE';
> Which returns:
>   ERROR:  failed to find plan for subquery ss

Thanks for the report!

I think the problem is that when we see a Var that references a
SUBQUERY RTE when deparsing a Plan tree to get the name of a field, we
assume that we are in a SubqueryScan plan node, in which case the code
is no problem because set_deparse_plan has set dpns->inner_plan to its
child plan.  However, this bug shows that this assumption does not
always hold: we might instead be in a Result node with a Var
referencing a SUBQUERY RTE.  This problem can be reproduced with the
query below.

EXPLAIN (VERBOSE, COSTS OFF)
SELECT (ss.a).x, (ss.a).n FROM
  (SELECT information_schema._pg_expandarray(ARRAY[1,2]) AS a) ss
WHERE FALSE;
ERROR:  failed to find plan for subquery ss

In this case, due to the constant false filter, the whole plan is
reduced to a dummy Result node, with a targetlist consisting of 'a.x'
and 'a.n', where 'a' is a Var referencing the SUBQUERY RTE.  We do not
generate a SubqueryScan plan node for the subquery, as the relation is
recognized as dummy.  That is to say, we neither have a valid
rte->subquery nor a valid SubqueryScan plan node.  So it seems to me
that there is no easy way to get the names of the fields in this case.
I'm wondering whether we can just compose a fake name with something
like below?

@@ -7903,6 +7903,14 @@ get_name_for_var_field(Var *var, int fieldno,
                deparse_namespace save_dpns;
                const char *result;

+               if (IsA(dpns->plan, Result))
+               {
+                   char       *fakeCol = palloc(32);
+
+                   snprintf(fakeCol, sizeof(fakeCol), "col%d", fieldno);
+                   return fakeCol;
+               }
+
                if (!dpns->inner_plan)
                    elog(ERROR, "failed to find plan for subquery %s",
                         rte->eref->aliasname);

This same problem can also happen to CTEs.

EXPLAIN (VERBOSE, COSTS OFF)
WITH ss AS MATERIALIZED
  (SELECT information_schema._pg_expandarray(ARRAY[1,2]) AS a)
SELECT (ss.a).x, (ss.a).n FROM ss WHERE FALSE;
ERROR:  failed to find plan for CTE ss

Thanks
Richard



Richard Guo <guofenglinux@gmail.com> writes:
> I think the problem is that when we see a Var that references a
> SUBQUERY RTE when deparsing a Plan tree to get the name of a field, we
> assume that we are in a SubqueryScan plan node, in which case the code
> is no problem because set_deparse_plan has set dpns->inner_plan to its
> child plan.  However, this bug shows that this assumption does not
> always hold: we might instead be in a Result node with a Var
> referencing a SUBQUERY RTE.

Yeah.  I think the comment about that in get_name_for_var_field was
accurate when written, but that was a few rounds of planner
improvements ago.  I found that your simplified query works up to
9.5 but fails in >= 9.6, whereupon I bisected to find

3fc6e2d7f5b652b417fa6937c34de2438d60fa9f is the first bad commit
commit 3fc6e2d7f5b652b417fa6937c34de2438d60fa9f
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date:   Mon Mar 7 15:58:22 2016 -0500

    Make the upper part of the planner work by generating and comparing Paths.

Immediately before that, the non-VERBOSE description of the plan was

 Result
   One-Time Filter: false
   ->  Result

while that commit changed it to

 Result
   One-Time Filter: false

So even before that, we'd been emitting a Result not a SubqueryScan
plan node, but it worked anyway because the lower Result had the
same tlist as the SubqueryScan it replaced: the VERBOSE output was

 Result
   Output: ((information_schema._pg_expandarray('{1,2}'::integer[]))).x,
((information_schema._pg_expandarray('{1,2}'::integer[]))).n
   One-Time Filter: false
   ->  Result
         Output: information_schema._pg_expandarray('{1,2}'::integer[])

However, once we decided we didn't really need the child plan node at
all, get_name_for_var_field was broken.  I'm surprised it took this
long to notice.

> ... That is to say, we neither have a valid
> rte->subquery nor a valid SubqueryScan plan node.  So it seems to me
> that there is no easy way to get the names of the fields in this case.

Indeed.

> I'm wondering whether we can just compose a fake name with something
> like below?

This seems like a band-aid.  Also, I experimented with it and found
that for your test query, the output is

 Result
   Output: (a).col1, (a).col2
   One-Time Filter: false

This is confusing (where'd "a" come from?) and it makes me quite
nervous that there are other cases where we'd also fail.  What
we basically have here is a dangling-reference Var with no valid
referent.  That's wrong in itself and it seems like it risks
causing executor problems not only EXPLAIN problems.  It's just
minimally safe because we know that the tlist will never actually
get evaluated ... but this could easily trip up logic that runs
during executor startup.

I wonder if we shouldn't change what the planner puts into the tlist
of a dummy Result.  That is, I'm imagining reducing the tlist of
such a Result to null Consts that would serve to show the right column
datatypes and not much else:

 Result
   Output: NULL, NULL
   One-Time Filter: false

I've not looked at how messy this might get, though.

> This same problem can also happen to CTEs.

Yeah, basically the same thing in the RTE_CTE switch case.

            regards, tom lane



I wrote:
> I wonder if we shouldn't change what the planner puts into the tlist
> of a dummy Result.  That is, I'm imagining reducing the tlist of
> such a Result to null Consts that would serve to show the right column
> datatypes and not much else:

>  Result
>    Output: NULL, NULL
>    One-Time Filter: false

I experimented with this, and was interested to find that there's
basically only one place we could do it in a low-level, localized
fashion, namely in set_plan_refs' handling of Result nodes; if you try
to do it earlier then set_upper_references fails when there's an upper
plan node that should copy such a column's value.  And there's
*already* an ugly hack there, which we could replace with a more
general hack.  See 0001 attached.

However, after testing it I got less excited, because it caused
quite a lot of regression test changes (which I didn't bother to
include in 0001).  Many of them seem like significant readability
lossage, for example

  GroupAggregate
-   Group Key: pagg_tab1.y
+   Group Key: (NULL::integer)
    ->  Sort
-         Sort Key: pagg_tab1.y
+         Sort Key: (NULL::integer)
          ->  Result
                One-Time Filter: false

and in a few cases I feel like the change actually means that the
test no longer proves what it intends to, because significant
details vanish into a sea of NULLs.

Another problem, which we could probably work around, is that this
breaks removal of trivial SubqueryScan nodes in some cases, because
that happens even later and the tlists no longer match.

So I feel like my idea is a dead end.  Even without the above
problems, a significant change in EXPLAIN's behavior in back
branches would be a hard sell --- the more so if it's to fix
a case that nobody even noticed for eight years.

So I'm coming around to doing something like the quick hack you
proposed.  I don't like it too much because it seems like it could
make other bugs in this area much harder to notice, but I don't have a
better idea.  We do need some work on the outdated comments though.
Also, I think we should use "f%d" not "col%d" by analogy to the
default field names for RowExprs and anonymous record types.
See 0002 attached.

            regards, tom lane

diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 7aed84584c..7c417d4e14 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -1013,26 +1013,41 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
                 else
                 {
                     /*
-                     * The tlist of a childless Result could contain
-                     * unresolved ROWID_VAR Vars, in case it's representing a
-                     * target relation which is completely empty because of
-                     * constraint exclusion.  Replace any such Vars by null
-                     * constants, as though they'd been resolved for a leaf
-                     * scan node that doesn't support them.  We could have
-                     * fix_scan_expr do this, but since the case is only
-                     * expected to occur here, it seems safer to special-case
-                     * it here and keep the assertions that ROWID_VARs
-                     * shouldn't be seen by fix_scan_expr.
+                     * The tlist of a dummy Result (one with constant-false
+                     * resconstantqual) could contain Vars, which are
+                     * effectively dangling references and have the potential
+                     * to cause problems in the executor or EXPLAIN.  We could
+                     * not get rid of these earlier without breaking Var
+                     * replacement in upper plan levels, but now let's reduce
+                     * the tlist to a bunch of NULL constants.  Since the
+                     * tlist should never be executed, this has no run-time
+                     * effect.  (One useful side-effect is that this makes it
+                     * safe for fix_scan_expr to assert that it won't see
+                     * ROWID_VARs; without this, unresolved ROWID Vars could
+                     * be found in the tlist of such a result.)
                      */
-                    foreach(l, splan->plan.targetlist)
+                    bool        dummy = false;
+
+                    if (list_length((List *) splan->resconstantqual) == 1)
                     {
-                        TargetEntry *tle = (TargetEntry *) lfirst(l);
-                        Var           *var = (Var *) tle->expr;
+                        Const       *cqual = (Const *) linitial((List *) splan->resconstantqual);
+
+                        dummy = IsA(cqual, Const) &&
+                            (cqual->constisnull ||
+                             !DatumGetBool(cqual->constvalue));
+                    }

-                        if (var && IsA(var, Var) && var->varno == ROWID_VAR)
-                            tle->expr = (Expr *) makeNullConst(var->vartype,
-                                                               var->vartypmod,
-                                                               var->varcollid);
+                    if (dummy)
+                    {
+                        foreach(l, splan->plan.targetlist)
+                        {
+                            TargetEntry *tle = (TargetEntry *) lfirst(l);
+                            Node       *expr = (Node *) tle->expr;
+
+                            tle->expr = (Expr *) makeNullConst(exprType(expr),
+                                                               exprTypmod(expr),
+                                                               exprCollation(expr));
+                        }
                     }

                     splan->plan.targetlist =
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index 653685bffc..ab88355d9a 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -7895,17 +7895,29 @@ get_name_for_var_field(Var *var, int fieldno,
                     /*
                      * We're deparsing a Plan tree so we don't have complete
                      * RTE entries (in particular, rte->subquery is NULL). But
-                     * the only place we'd see a Var directly referencing a
-                     * SUBQUERY RTE is in a SubqueryScan plan node, and we can
-                     * look into the child plan's tlist instead.
+                     * the only place we'd normally see a Var directly
+                     * referencing a SUBQUERY RTE is in a SubqueryScan plan
+                     * node, and we can look into the child plan's tlist
+                     * instead.  An exception occurs if the subquery was
+                     * proven empty and optimized away: then we'd find such a
+                     * Var in a childless Result node, and there's nothing in
+                     * the plan tree that would let us figure out what it had
+                     * originally referenced.  In that case, fall back on
+                     * printing "fN", analogously to the default column names
+                     * for RowExprs.
                      */
                     TargetEntry *tle;
                     deparse_namespace save_dpns;
                     const char *result;

                     if (!dpns->inner_plan)
-                        elog(ERROR, "failed to find plan for subquery %s",
-                             rte->eref->aliasname);
+                    {
+                        char       *dummyname = palloc(32);
+
+                        snprintf(dummyname, 32, "f%d", fieldno);
+                        return dummyname;
+                    }
+
                     tle = get_tle_by_resno(dpns->inner_tlist, attnum);
                     if (!tle)
                         elog(ERROR, "bogus varattno for subquery var: %d",
@@ -8014,20 +8026,27 @@ get_name_for_var_field(Var *var, int fieldno,
                 {
                     /*
                      * We're deparsing a Plan tree so we don't have a CTE
-                     * list.  But the only places we'd see a Var directly
-                     * referencing a CTE RTE are in CteScan or WorkTableScan
-                     * plan nodes.  For those cases, set_deparse_plan arranged
-                     * for dpns->inner_plan to be the plan node that emits the
-                     * CTE or RecursiveUnion result, and we can look at its
-                     * tlist instead.
+                     * list.  But the only places we'd normally see a Var
+                     * directly referencing a CTE RTE are in CteScan or
+                     * WorkTableScan plan nodes.  For those cases,
+                     * set_deparse_plan arranged for dpns->inner_plan to be
+                     * the plan node that emits the CTE or RecursiveUnion
+                     * result, and we can look at its tlist instead.  As
+                     * above, this can fail if the CTE has been proven empty,
+                     * in which case fall back to "fN".
                      */
                     TargetEntry *tle;
                     deparse_namespace save_dpns;
                     const char *result;

                     if (!dpns->inner_plan)
-                        elog(ERROR, "failed to find plan for CTE %s",
-                             rte->eref->aliasname);
+                    {
+                        char       *dummyname = palloc(32);
+
+                        snprintf(dummyname, 32, "f%d", fieldno);
+                        return dummyname;
+                    }
+
                     tle = get_tle_by_resno(dpns->inner_tlist, attnum);
                     if (!tle)
                         elog(ERROR, "bogus varattno for subquery var: %d",
diff --git a/src/test/regress/expected/rowtypes.out b/src/test/regress/expected/rowtypes.out
index b400b58f76..9168979a62 100644
--- a/src/test/regress/expected/rowtypes.out
+++ b/src/test/regress/expected/rowtypes.out
@@ -1300,6 +1300,60 @@ select pg_get_viewdef('composite_v', true);
 (1 row)

 drop view composite_v;
+--
+-- Check cases where the composite comes from a proven-dummy rel (bug #18576)
+--
+explain (verbose, costs off)
+select (ss.a).x, (ss.a).n from
+  (select information_schema._pg_expandarray(array[1,2]) AS a) ss;
+                               QUERY PLAN
+------------------------------------------------------------------------
+ Subquery Scan on ss
+   Output: (ss.a).x, (ss.a).n
+   ->  ProjectSet
+         Output: information_schema._pg_expandarray('{1,2}'::integer[])
+         ->  Result
+(5 rows)
+
+explain (verbose, costs off)
+select (ss.a).x, (ss.a).n from
+  (select information_schema._pg_expandarray(array[1,2]) AS a) ss
+where false;
+        QUERY PLAN
+--------------------------
+ Result
+   Output: (a).f1, (a).f2
+   One-Time Filter: false
+(3 rows)
+
+explain (verbose, costs off)
+with cte(c) as materialized (select row(1, 2)),
+     cte2(c) as (select * from cte)
+select (c).f1 from cte2 as t;
+            QUERY PLAN
+-----------------------------------
+ CTE Scan on cte
+   Output: (cte.c).f1
+   CTE cte
+     ->  Result
+           Output: '(1,2)'::record
+(5 rows)
+
+explain (verbose, costs off)
+with cte(c) as materialized (select row(1, 2)),
+     cte2(c) as (select * from cte)
+select (c).f1 from cte2 as t
+where false;
+            QUERY PLAN
+-----------------------------------
+ Result
+   Output: (cte.c).f1
+   One-Time Filter: false
+   CTE cte
+     ->  Result
+           Output: '(1,2)'::record
+(6 rows)
+
 --
 -- Tests for component access / FieldSelect
 --
diff --git a/src/test/regress/sql/rowtypes.sql b/src/test/regress/sql/rowtypes.sql
index fd47dc9e0f..174b062144 100644
--- a/src/test/regress/sql/rowtypes.sql
+++ b/src/test/regress/sql/rowtypes.sql
@@ -520,6 +520,27 @@ where (select * from (select c as c1) s
 select pg_get_viewdef('composite_v', true);
 drop view composite_v;

+--
+-- Check cases where the composite comes from a proven-dummy rel (bug #18576)
+--
+explain (verbose, costs off)
+select (ss.a).x, (ss.a).n from
+  (select information_schema._pg_expandarray(array[1,2]) AS a) ss;
+explain (verbose, costs off)
+select (ss.a).x, (ss.a).n from
+  (select information_schema._pg_expandarray(array[1,2]) AS a) ss
+where false;
+
+explain (verbose, costs off)
+with cte(c) as materialized (select row(1, 2)),
+     cte2(c) as (select * from cte)
+select (c).f1 from cte2 as t;
+explain (verbose, costs off)
+with cte(c) as materialized (select row(1, 2)),
+     cte2(c) as (select * from cte)
+select (c).f1 from cte2 as t
+where false;
+
 --
 -- Tests for component access / FieldSelect
 --

On Fri, Aug 9, 2024 at 4:35 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> However, after testing it I got less excited, because it caused
> quite a lot of regression test changes (which I didn't bother to
> include in 0001).  Many of them seem like significant readability
> lossage, for example
>
>   GroupAggregate
> -   Group Key: pagg_tab1.y
> +   Group Key: (NULL::integer)
>     ->  Sort
> -         Sort Key: pagg_tab1.y
> +         Sort Key: (NULL::integer)
>           ->  Result
>                 One-Time Filter: false

Yeah, it seems that the original tlist of a dummy Result is needed to
deparse upper plan nodes.  With 0001, expressions that should be
matched to lower tlist might end up with all NULLs, which seems not
great.

> So I'm coming around to doing something like the quick hack you
> proposed.  I don't like it too much because it seems like it could
> make other bugs in this area much harder to notice, but I don't have a
> better idea.  We do need some work on the outdated comments though.
> Also, I think we should use "f%d" not "col%d" by analogy to the
> default field names for RowExprs and anonymous record types.

Agreed.  Do you think it would be helpful to add some assertions to
verify the plan type?  For example, if dpns->inner_plan is NULL, the
plan should be a Result node; otherwise, it must be a SubqueryScan
node (or in the RTE_CTE case, it must be a CteScan/WorkTableScan
node).

Thanks
Richard



Richard Guo <guofenglinux@gmail.com> writes:
> Agreed.  Do you think it would be helpful to add some assertions to
> verify the plan type?

I thought about that but didn't experiment with it.  I wonder whether
it'd just make the code more fragile.  Still, it might be useful to
verify that things are happening as we expect.

            regards, tom lane



I wrote:
> Richard Guo <guofenglinux@gmail.com> writes:
>> Agreed.  Do you think it would be helpful to add some assertions to
>> verify the plan type?

> I thought about that but didn't experiment with it.  I wonder whether
> it'd just make the code more fragile.  Still, it might be useful to
> verify that things are happening as we expect.

After sleeping on it, I elected to add the Asserts as suggested,
except in v12 where they'd have had to look different.  That's
not strictly a matter of laziness: v12's next release will be its
last, and I've been burned often enough to become very hesitant
about pushing even slightly-questionable code into an EOL release.

Pushed with that change.

            regards, tom lane