Thread: Assert failure in CTE inlining with view and correlated subquery
Hi, it seems there's something wrong with CTE inlining when there's a view containing a correlated subquery referencing the CTE. Consider a simple example like this: create table results ( id serial primary key, run text, tps float4 ); create view results_agg as with base_tps as ( select run, tps from results ) select run, count(*) as runs, (select tps from base_tps b where b.run = r.run) AS base_tps from results r group by run order by run; explain SELECT run FROM results_agg ORDER BY 1; This crashes on this assert in inline_cte(): Assert(context.refcount == 0); because the refcount value remains 1. There's a backtrace attached. I don't know why exactly this happens, my knowledge of CTE inlining is somewhat limited. The counter is clearly out of sync but a couple more observations: 1) it fails all the way back to PG12, where CTE inlining was added 2) it does not happen if the CTE is defined as MATERIALIZED QUERY PLAN ----------------------------------------- Subquery Scan on results_agg -> Sort Sort Key: r.run CTE base_tps -> Seq Scan on results -> HashAggregate Group Key: r.run -> Seq Scan on results r (8 rows) 3) without asserts, it seems to work and the query generates this plan QUERY PLAN ----------------------------------------- Subquery Scan on results_agg -> Sort Sort Key: r.run -> HashAggregate Group Key: r.run -> Seq Scan on results r (6 rows) 4) it does not seem to happen without the view, i.e. this works explain with base_tps as ( select run, tps from results ) select run from ( select run, count(*) as runs, (select tps from base_tps b where b.run = r.run) AS base_tps from results r group by run order by run ) results_agg order by 1; The difference between plans in (2) and (3) is interesting, because it seems the CTE got inlined, so why was the refcount not decremented? regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
On Thu, Apr 21, 2022 at 5:33 AM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:
The difference between plans in (2) and (3) is interesting, because it
seems the CTE got inlined, so why was the refcount not decremented?
entry would not appear anywhere in the query tree. That's why refcount
is not decremented after inline cte walker.
If we explicitly reference the cte in the query, say in the targetlist,
it would then work.
# explain (costs off) SELECT * FROM results_agg ORDER BY 1;
QUERY PLAN
---------------------------------------
Sort
Sort Key: r.run
-> HashAggregate
Group Key: r.run
-> Seq Scan on results r
SubPlan 1
-> Seq Scan on results
Filter: (run = r.run)
(8 rows)
IMO the culprit is that we incorrectly set cterefcount to one while
actually the cte is not referenced at all.
Thanks
Richard
On Thu, Apr 21, 2022 at 5:33 AM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:
it seems there's something wrong with CTE inlining when there's a view
containing a correlated subquery referencing the CTE.
BTW, seems view is not a necessary condition to reproduce this issue.
For instance:create table t (a int, b int);
explain (costs off) select a from
(
with t_cte as (select a, b from t)
select
a,
(select b from t_cte where t_cte.a = t.a) AS t_sub
from t
) sub;
Thanks
Richard
On Thu, Apr 21, 2022 at 3:51 PM Richard Guo <guofenglinux@gmail.com> wrote:
On Thu, Apr 21, 2022 at 5:33 AM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:
it seems there's something wrong with CTE inlining when there's a view
containing a correlated subquery referencing the CTE.BTW, seems view is not a necessary condition to reproduce this issue.For instance:
create table t (a int, b int);
explain (costs off) select a from
(
with t_cte as (select a, b from t)
select
a,
(select b from t_cte where t_cte.a = t.a) AS t_sub
from t
) sub;
Further debugging shows that in this repro the reference to the CTE is
removed when generating paths for the subquery 'sub', where we would tryto remove subquery targetlist items that are not needed. So for the
items we are to remove, maybe we need to check if they contain CTEs and
if so decrease cterefcount of the CTEs correspondingly.
Thanks
Richard
On 4/21/22 10:29, Richard Guo wrote: > > On Thu, Apr 21, 2022 at 3:51 PM Richard Guo <guofenglinux@gmail.com > <mailto:guofenglinux@gmail.com>> wrote: > > > On Thu, Apr 21, 2022 at 5:33 AM Tomas Vondra > <tomas.vondra@enterprisedb.com > <mailto:tomas.vondra@enterprisedb.com>> wrote: > > > it seems there's something wrong with CTE inlining when there's > a view > containing a correlated subquery referencing the CTE. > > > BTW, seems view is not a necessary condition to reproduce this issue. > For instance: > > create table t (a int, b int); > > explain (costs off) select a from > ( > with t_cte as (select a, b from t) > select > a, > (select b from t_cte where t_cte.a = t.a) AS t_sub > from t > ) sub; > > > Further debugging shows that in this repro the reference to the CTE is > removed when generating paths for the subquery 'sub', where we would try > to remove subquery targetlist items that are not needed. So for the > items we are to remove, maybe we need to check if they contain CTEs and > if so decrease cterefcount of the CTEs correspondingly. > Right, at some point we remove the unnecessary targetlist entries, but that ignores the entry may reference a CTE. That's pretty much what I meant by the counter being "out of sync". Updating the counter while removing the entry is one option, but maybe we could simply delay counting the CTE references until after that? regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Tomas Vondra <tomas.vondra@enterprisedb.com> writes: > On 4/21/22 10:29, Richard Guo wrote: >> Further debugging shows that in this repro the reference to the CTE is >> removed when generating paths for the subquery 'sub', where we would try >> to remove subquery targetlist items that are not needed. So for the >> items we are to remove, maybe we need to check if they contain CTEs and >> if so decrease cterefcount of the CTEs correspondingly. > Right, at some point we remove the unnecessary targetlist entries, but > that ignores the entry may reference a CTE. That's pretty much what I > meant by the counter being "out of sync". > Updating the counter while removing the entry is one option, but maybe > we could simply delay counting the CTE references until after that? I think we should just drop this cross-check altogether; it is not nearly useful enough to justify the work that'd be involved in maintaining cterefcount accurately for all such transformations. All it's really there for is to be sure that we don't need to make a subplan for the inlined CTE. There are two downstream consumers of cte_plan_ids, which currently just have Asserts that we made a subplan. I think it'd be worth converting those to real run-time tests, which leads me to something more or less as attached. regards, tom lane diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c index 998212dda8..d84f66a81b 100644 --- a/src/backend/optimizer/path/allpaths.c +++ b/src/backend/optimizer/path/allpaths.c @@ -2804,7 +2804,8 @@ set_cte_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte) if (ndx >= list_length(cteroot->cte_plan_ids)) elog(ERROR, "could not find plan for CTE \"%s\"", rte->ctename); plan_id = list_nth_int(cteroot->cte_plan_ids, ndx); - Assert(plan_id > 0); + if (plan_id <= 0) + elog(ERROR, "no plan was made for CTE \"%s\"", rte->ctename); cteplan = (Plan *) list_nth(root->glob->subplans, plan_id - 1); /* Mark rel with estimated output rows, width, etc */ diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c index 95476ada0b..7905bc4654 100644 --- a/src/backend/optimizer/plan/createplan.c +++ b/src/backend/optimizer/plan/createplan.c @@ -3898,7 +3898,8 @@ create_ctescan_plan(PlannerInfo *root, Path *best_path, if (ndx >= list_length(cteroot->cte_plan_ids)) elog(ERROR, "could not find plan for CTE \"%s\"", rte->ctename); plan_id = list_nth_int(cteroot->cte_plan_ids, ndx); - Assert(plan_id > 0); + if (plan_id <= 0) + elog(ERROR, "no plan was made for CTE \"%s\"", rte->ctename); foreach(lc, cteroot->init_plans) { ctesplan = (SubPlan *) lfirst(lc); diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c index 863e0e24a1..df4ca12919 100644 --- a/src/backend/optimizer/plan/subselect.c +++ b/src/backend/optimizer/plan/subselect.c @@ -61,7 +61,6 @@ typedef struct inline_cte_walker_context { const char *ctename; /* name and relative level of target CTE */ int levelsup; - int refcount; /* number of remaining references */ Query *ctequery; /* query to substitute */ } inline_cte_walker_context; @@ -1157,13 +1156,9 @@ inline_cte(PlannerInfo *root, CommonTableExpr *cte) context.ctename = cte->ctename; /* Start at levelsup = -1 because we'll immediately increment it */ context.levelsup = -1; - context.refcount = cte->cterefcount; context.ctequery = castNode(Query, cte->ctequery); (void) inline_cte_walker((Node *) root->parse, &context); - - /* Assert we replaced all references */ - Assert(context.refcount == 0); } static bool @@ -1226,9 +1221,6 @@ inline_cte_walker(Node *node, inline_cte_walker_context *context) rte->coltypes = NIL; rte->coltypmods = NIL; rte->colcollations = NIL; - - /* Count the number of replacements we've done */ - context->refcount--; } return false; diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h index c5ab53e05c..244d1e1197 100644 --- a/src/include/nodes/pathnodes.h +++ b/src/include/nodes/pathnodes.h @@ -241,7 +241,8 @@ struct PlannerInfo List *init_plans; /* init SubPlans for query */ - List *cte_plan_ids; /* per-CTE-item list of subplan IDs */ + List *cte_plan_ids; /* per-CTE-item list of subplan IDs (or -1 if + * no subplan was made for that CTE) */ List *multiexpr_params; /* List of Lists of Params for MULTIEXPR * subquery outputs */
On 4/21/22 21:03, Tom Lane wrote: > Tomas Vondra <tomas.vondra@enterprisedb.com> writes: >> On 4/21/22 10:29, Richard Guo wrote: >>> Further debugging shows that in this repro the reference to the CTE is >>> removed when generating paths for the subquery 'sub', where we would try >>> to remove subquery targetlist items that are not needed. So for the >>> items we are to remove, maybe we need to check if they contain CTEs and >>> if so decrease cterefcount of the CTEs correspondingly. > >> Right, at some point we remove the unnecessary targetlist entries, but >> that ignores the entry may reference a CTE. That's pretty much what I >> meant by the counter being "out of sync". >> Updating the counter while removing the entry is one option, but maybe >> we could simply delay counting the CTE references until after that? > > I think we should just drop this cross-check altogether; it is not nearly > useful enough to justify the work that'd be involved in maintaining > cterefcount accurately for all such transformations. All it's really > there for is to be sure that we don't need to make a subplan for the > inlined CTE. > > There are two downstream consumers of cte_plan_ids, which currently just > have Asserts that we made a subplan. I think it'd be worth converting > those to real run-time tests, which leads me to something more or less as > attached. > WFM. I'm not particularly attached to the assert, so if you say it's not worth it let's get rid of it. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Tomas Vondra <tomas.vondra@enterprisedb.com> writes: > On 4/21/22 21:03, Tom Lane wrote: >> I think we should just drop this cross-check altogether; it is not nearly >> useful enough to justify the work that'd be involved in maintaining >> cterefcount accurately for all such transformations. All it's really >> there for is to be sure that we don't need to make a subplan for the >> inlined CTE. > WFM. I'm not particularly attached to the assert, so if you say it's not > worth it let's get rid of it. Done. regards, tom lane