Thread: Assert failure in CTE inlining with view and correlated subquery

Assert failure in CTE inlining with view and correlated subquery

From
Tomas Vondra
Date:
Hi,

it seems there's something wrong with CTE inlining when there's a view
containing a correlated subquery referencing the CTE. Consider a simple
example like this:

    create table results (
      id          serial primary key,
      run         text,
      tps         float4
    );

    create view results_agg as
    with base_tps as (
      select run, tps from results
    )
    select
        run,
        count(*) as runs,

        (select tps from base_tps b where b.run = r.run) AS base_tps

    from results r
    group by
        run
    order by
        run;

    explain SELECT run FROM results_agg ORDER BY 1;


This crashes on this assert in inline_cte():

    Assert(context.refcount == 0);

because the refcount value remains 1. There's a backtrace attached.

I don't know why exactly this happens, my knowledge of CTE inlining is
somewhat limited. The counter is clearly out of sync


but a couple more observations:

1) it fails all the way back to PG12, where CTE inlining was added

2) it does not happen if the CTE is defined as MATERIALIZED

                   QUERY PLAN
    -----------------------------------------
     Subquery Scan on results_agg
       ->  Sort
             Sort Key: r.run
             CTE base_tps
               ->  Seq Scan on results
             ->  HashAggregate
                   Group Key: r.run
                   ->  Seq Scan on results r
    (8 rows)

3) without asserts, it seems to work and the query generates this plan

                   QUERY PLAN
    -----------------------------------------
     Subquery Scan on results_agg
       ->  Sort
             Sort Key: r.run
             ->  HashAggregate
                   Group Key: r.run
                   ->  Seq Scan on results r
    (6 rows)

4) it does not seem to happen without the view, i.e. this works

    explain
    with base_tps as (
      select run, tps from results
    )
    select run from (
      select
        run,
        count(*) as runs,

        (select tps from base_tps b where b.run = r.run) AS base_tps

    from results r
    group by
        run
    order by
        run
    ) results_agg order by 1;

The difference between plans in (2) and (3) is interesting, because it
seems the CTE got inlined, so why was the refcount not decremented?


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachment

Re: Assert failure in CTE inlining with view and correlated subquery

From
Richard Guo
Date:

On Thu, Apr 21, 2022 at 5:33 AM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:

The difference between plans in (2) and (3) is interesting, because it
seems the CTE got inlined, so why was the refcount not decremented?

The query is not actually referencing the cte. So the cte range table
entry would not appear anywhere in the query tree. That's why refcount
is not decremented after inline cte walker.

If we explicitly reference the cte in the query, say in the targetlist,
it would then work.

# explain (costs off) SELECT * FROM results_agg ORDER BY 1;
              QUERY PLAN
---------------------------------------
 Sort
   Sort Key: r.run
   ->  HashAggregate
         Group Key: r.run
         ->  Seq Scan on results r
         SubPlan 1
           ->  Seq Scan on results
                 Filter: (run = r.run)
(8 rows)

IMO the culprit is that we incorrectly set cterefcount to one while
actually the cte is not referenced at all.

Thanks
Richard 

Re: Assert failure in CTE inlining with view and correlated subquery

From
Richard Guo
Date:

On Thu, Apr 21, 2022 at 5:33 AM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:

it seems there's something wrong with CTE inlining when there's a view
containing a correlated subquery referencing the CTE. 

BTW, seems view is not a necessary condition to reproduce this issue.
For instance:

create table t (a int, b int);

explain (costs off) select a from
(
    with t_cte as (select a, b from t)
    select
        a,
        (select b from t_cte where t_cte.a = t.a) AS t_sub
    from t
) sub;

Thanks
Richard

Re: Assert failure in CTE inlining with view and correlated subquery

From
Richard Guo
Date:

On Thu, Apr 21, 2022 at 3:51 PM Richard Guo <guofenglinux@gmail.com> wrote:

On Thu, Apr 21, 2022 at 5:33 AM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:

it seems there's something wrong with CTE inlining when there's a view
containing a correlated subquery referencing the CTE. 

BTW, seems view is not a necessary condition to reproduce this issue.
For instance:

create table t (a int, b int);

explain (costs off) select a from
(
    with t_cte as (select a, b from t)
    select
        a,
        (select b from t_cte where t_cte.a = t.a) AS t_sub
    from t
) sub;

Further debugging shows that in this repro the reference to the CTE is
removed when generating paths for the subquery 'sub', where we would try
to remove subquery targetlist items that are not needed. So for the
items we are to remove, maybe we need to check if they contain CTEs and
if so decrease cterefcount of the CTEs correspondingly.

Thanks
Richard

Re: Assert failure in CTE inlining with view and correlated subquery

From
Tomas Vondra
Date:
On 4/21/22 10:29, Richard Guo wrote:
> 
> On Thu, Apr 21, 2022 at 3:51 PM Richard Guo <guofenglinux@gmail.com
> <mailto:guofenglinux@gmail.com>> wrote:
> 
> 
>     On Thu, Apr 21, 2022 at 5:33 AM Tomas Vondra
>     <tomas.vondra@enterprisedb.com
>     <mailto:tomas.vondra@enterprisedb.com>> wrote:
> 
> 
>         it seems there's something wrong with CTE inlining when there's
>         a view
>         containing a correlated subquery referencing the CTE. 
> 
> 
>     BTW, seems view is not a necessary condition to reproduce this issue.
>     For instance:
> 
>     create table t (a int, b int);
> 
>     explain (costs off) select a from
>     (
>         with t_cte as (select a, b from t)
>         select
>             a,
>             (select b from t_cte where t_cte.a = t.a) AS t_sub
>         from t
>     ) sub;
> 
> 
> Further debugging shows that in this repro the reference to the CTE is
> removed when generating paths for the subquery 'sub', where we would try
> to remove subquery targetlist items that are not needed. So for the
> items we are to remove, maybe we need to check if they contain CTEs and
> if so decrease cterefcount of the CTEs correspondingly.
> 

Right, at some point we remove the unnecessary targetlist entries, but
that ignores the entry may reference a CTE. That's pretty much what I
meant by the counter being "out of sync".

Updating the counter while removing the entry is one option, but maybe
we could simply delay counting the CTE references until after that?


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Assert failure in CTE inlining with view and correlated subquery

From
Tom Lane
Date:
Tomas Vondra <tomas.vondra@enterprisedb.com> writes:
> On 4/21/22 10:29, Richard Guo wrote:
>> Further debugging shows that in this repro the reference to the CTE is
>> removed when generating paths for the subquery 'sub', where we would try
>> to remove subquery targetlist items that are not needed. So for the
>> items we are to remove, maybe we need to check if they contain CTEs and
>> if so decrease cterefcount of the CTEs correspondingly.

> Right, at some point we remove the unnecessary targetlist entries, but
> that ignores the entry may reference a CTE. That's pretty much what I
> meant by the counter being "out of sync".
> Updating the counter while removing the entry is one option, but maybe
> we could simply delay counting the CTE references until after that?

I think we should just drop this cross-check altogether; it is not nearly
useful enough to justify the work that'd be involved in maintaining
cterefcount accurately for all such transformations.  All it's really
there for is to be sure that we don't need to make a subplan for the
inlined CTE.

There are two downstream consumers of cte_plan_ids, which currently just
have Asserts that we made a subplan.  I think it'd be worth converting
those to real run-time tests, which leads me to something more or less as
attached.

            regards, tom lane

diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 998212dda8..d84f66a81b 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -2804,7 +2804,8 @@ set_cte_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
     if (ndx >= list_length(cteroot->cte_plan_ids))
         elog(ERROR, "could not find plan for CTE \"%s\"", rte->ctename);
     plan_id = list_nth_int(cteroot->cte_plan_ids, ndx);
-    Assert(plan_id > 0);
+    if (plan_id <= 0)
+        elog(ERROR, "no plan was made for CTE \"%s\"", rte->ctename);
     cteplan = (Plan *) list_nth(root->glob->subplans, plan_id - 1);

     /* Mark rel with estimated output rows, width, etc */
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 95476ada0b..7905bc4654 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -3898,7 +3898,8 @@ create_ctescan_plan(PlannerInfo *root, Path *best_path,
     if (ndx >= list_length(cteroot->cte_plan_ids))
         elog(ERROR, "could not find plan for CTE \"%s\"", rte->ctename);
     plan_id = list_nth_int(cteroot->cte_plan_ids, ndx);
-    Assert(plan_id > 0);
+    if (plan_id <= 0)
+        elog(ERROR, "no plan was made for CTE \"%s\"", rte->ctename);
     foreach(lc, cteroot->init_plans)
     {
         ctesplan = (SubPlan *) lfirst(lc);
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index 863e0e24a1..df4ca12919 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -61,7 +61,6 @@ typedef struct inline_cte_walker_context
 {
     const char *ctename;        /* name and relative level of target CTE */
     int            levelsup;
-    int            refcount;        /* number of remaining references */
     Query       *ctequery;        /* query to substitute */
 } inline_cte_walker_context;

@@ -1157,13 +1156,9 @@ inline_cte(PlannerInfo *root, CommonTableExpr *cte)
     context.ctename = cte->ctename;
     /* Start at levelsup = -1 because we'll immediately increment it */
     context.levelsup = -1;
-    context.refcount = cte->cterefcount;
     context.ctequery = castNode(Query, cte->ctequery);

     (void) inline_cte_walker((Node *) root->parse, &context);
-
-    /* Assert we replaced all references */
-    Assert(context.refcount == 0);
 }

 static bool
@@ -1226,9 +1221,6 @@ inline_cte_walker(Node *node, inline_cte_walker_context *context)
             rte->coltypes = NIL;
             rte->coltypmods = NIL;
             rte->colcollations = NIL;
-
-            /* Count the number of replacements we've done */
-            context->refcount--;
         }

         return false;
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index c5ab53e05c..244d1e1197 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -241,7 +241,8 @@ struct PlannerInfo

     List       *init_plans;        /* init SubPlans for query */

-    List       *cte_plan_ids;    /* per-CTE-item list of subplan IDs */
+    List       *cte_plan_ids;    /* per-CTE-item list of subplan IDs (or -1 if
+                                 * no subplan was made for that CTE) */

     List       *multiexpr_params;    /* List of Lists of Params for MULTIEXPR
                                      * subquery outputs */

Re: Assert failure in CTE inlining with view and correlated subquery

From
Tomas Vondra
Date:
On 4/21/22 21:03, Tom Lane wrote:
> Tomas Vondra <tomas.vondra@enterprisedb.com> writes:
>> On 4/21/22 10:29, Richard Guo wrote:
>>> Further debugging shows that in this repro the reference to the CTE is
>>> removed when generating paths for the subquery 'sub', where we would try
>>> to remove subquery targetlist items that are not needed. So for the
>>> items we are to remove, maybe we need to check if they contain CTEs and
>>> if so decrease cterefcount of the CTEs correspondingly.
> 
>> Right, at some point we remove the unnecessary targetlist entries, but
>> that ignores the entry may reference a CTE. That's pretty much what I
>> meant by the counter being "out of sync".
>> Updating the counter while removing the entry is one option, but maybe
>> we could simply delay counting the CTE references until after that?
> 
> I think we should just drop this cross-check altogether; it is not nearly
> useful enough to justify the work that'd be involved in maintaining
> cterefcount accurately for all such transformations.  All it's really
> there for is to be sure that we don't need to make a subplan for the
> inlined CTE.
> 
> There are two downstream consumers of cte_plan_ids, which currently just
> have Asserts that we made a subplan.  I think it'd be worth converting
> those to real run-time tests, which leads me to something more or less as
> attached.
> 

WFM. I'm not particularly attached to the assert, so if you say it's not
worth it let's get rid of it.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Assert failure in CTE inlining with view and correlated subquery

From
Tom Lane
Date:
Tomas Vondra <tomas.vondra@enterprisedb.com> writes:
> On 4/21/22 21:03, Tom Lane wrote:
>> I think we should just drop this cross-check altogether; it is not nearly
>> useful enough to justify the work that'd be involved in maintaining
>> cterefcount accurately for all such transformations.  All it's really
>> there for is to be sure that we don't need to make a subplan for the
>> inlined CTE.

> WFM. I'm not particularly attached to the assert, so if you say it's not
> worth it let's get rid of it.

Done.

            regards, tom lane