Thread: EXPLAIN(VERBOSE) to CTE with SEARCH BREADTH FIRST fails
Hi, While working on [1], we found that EXPLAIN(VERBOSE) to CTE with SEARCH BREADTH FIRST ends up ERROR. This can be reproduced at the current HEAD(4c3478859b7359912d7): =# create table graph0( f int, t int, label text); CREATE TABLE =# insert into graph0 values (1, 2, 'arc 1 -> 2'),(1, 3, 'arc 1 -> 3'),(2, 3, 'arc 2 -> 3'),(1, 4, 'arc 1 -> 4'),(4, 5, 'arc 4 -> 5'); INSERT 0 5 =# explain(verbose) with recursive search_graph(f, t, label) as ( select * from graph0 g union all select g.* from graph0 g, search_graph sg where g.f = sg.t ) search breadth first by f, t set seq select * from search_graph order by seq; ERROR: failed to find plan for CTE sg Is this a bug? [1] https://www.postgresql.org/message-id/flat/cf8501bcd95ba4d727cbba886ba9eea8@oss.nttdata.com Regards, -- Atsushi Torikoshi NTT DATA CORPORATION
torikoshia <torikoshia@oss.nttdata.com> writes: > While working on [1], we found that EXPLAIN(VERBOSE) to CTE with SEARCH > BREADTH FIRST ends up ERROR. Yeah. It's failing here: * We're deparsing a Plan tree so we don't have a CTE * list. But the only place we'd see a Var directly * referencing a CTE RTE is in a CteScan plan node, and we * can look into the subplan's tlist instead. if (!dpns->inner_plan) elog(ERROR, "failed to find plan for CTE %s", rte->eref->aliasname); The problematic Var is *not* in a CteScan plan node; it's in a WorkTableScan node. It's not clear to me whether this is a bug in the planner's handling of SEARCH BREADTH FIRST, or if the plan is as-intended and ruleutils.c is failing to cope. Either way, this deserves an open item... regards, tom lane
On 07.09.21 20:31, Tom Lane wrote: > torikoshia <torikoshia@oss.nttdata.com> writes: >> While working on [1], we found that EXPLAIN(VERBOSE) to CTE with SEARCH >> BREADTH FIRST ends up ERROR. > > Yeah. It's failing here: > > * We're deparsing a Plan tree so we don't have a CTE > * list. But the only place we'd see a Var directly > * referencing a CTE RTE is in a CteScan plan node, and we > * can look into the subplan's tlist instead. > > if (!dpns->inner_plan) > elog(ERROR, "failed to find plan for CTE %s", > rte->eref->aliasname); > > The problematic Var is *not* in a CteScan plan node; it's in a > WorkTableScan node. It's not clear to me whether this is a bug > in the planner's handling of SEARCH BREADTH FIRST, or if the plan > is as-intended and ruleutils.c is failing to cope. The search clause is resolved by the rewriter, so it's unlikely that the planner is doing something wrong. Either the rewriting produces something incorrect (but then one might expect that the query results would be wrong), or the structures constructed by rewriting are not easily handled by ruleutils.c. If we start from the example in the documentation <https://www.postgresql.org/docs/14/queries-with.html#QUERIES-WITH-RECURSIVE>: """ WITH RECURSIVE search_tree(id, link, data, depth) AS ( SELECT t.id, t.link, t.data, 0 FROM tree t UNION ALL SELECT t.id, t.link, t.data, depth + 1 FROM tree t, search_tree st WHERE t.id = st.link ) SELECT * FROM search_tree ORDER BY depth; To get a stable sort, add data columns as secondary sorting columns. """ In order to handle that part about the stable sort, the query constructed internally is something like WITH RECURSIVE search_tree(id, link, data, seq) AS ( SELECT t.id, t.link, t.data, ROW(0, id, link) FROM tree t UNION ALL SELECT t.id, t.link, t.data, ROW(seq.depth + 1, id, link) FROM tree t, search_tree st WHERE t.id = st.link ) SELECT * FROM search_tree ORDER BY seq; The bit "seq.depth" isn't really valid when typed in like that, I think, but of course internally this is all wired together with numbers rather than identifiers. I suspect that that is what ruleutils.c trips over.
On 2021-09-09 19:03, Peter Eisentraut wrote: > On 07.09.21 20:31, Tom Lane wrote: >> torikoshia <torikoshia@oss.nttdata.com> writes: >>> While working on [1], we found that EXPLAIN(VERBOSE) to CTE with >>> SEARCH >>> BREADTH FIRST ends up ERROR. >> >> Yeah. It's failing here: >> >> * We're deparsing a Plan tree so we don't have a >> CTE >> * list. But the only place we'd see a Var >> directly >> * referencing a CTE RTE is in a CteScan plan >> node, and we >> * can look into the subplan's tlist instead. >> >> if (!dpns->inner_plan) >> elog(ERROR, "failed to find plan for CTE %s", >> rte->eref->aliasname); >> >> The problematic Var is *not* in a CteScan plan node; it's in a >> WorkTableScan node. It's not clear to me whether this is a bug >> in the planner's handling of SEARCH BREADTH FIRST, or if the plan >> is as-intended and ruleutils.c is failing to cope. > > The search clause is resolved by the rewriter, so it's unlikely that > the planner is doing something wrong. Either the rewriting produces > something incorrect (but then one might expect that the query results > would be wrong), or the structures constructed by rewriting are not > easily handled by ruleutils.c. > > If we start from the example in the documentation > <https://www.postgresql.org/docs/14/queries-with.html#QUERIES-WITH-RECURSIVE>: > > """ > WITH RECURSIVE search_tree(id, link, data, depth) AS ( > SELECT t.id, t.link, t.data, 0 > FROM tree t > UNION ALL > SELECT t.id, t.link, t.data, depth + 1 > FROM tree t, search_tree st > WHERE t.id = st.link > ) > SELECT * FROM search_tree ORDER BY depth; > > To get a stable sort, add data columns as secondary sorting columns. > """ > > In order to handle that part about the stable sort, the query > constructed internally is something like > > WITH RECURSIVE search_tree(id, link, data, seq) AS ( > SELECT t.id, t.link, t.data, ROW(0, id, link) > FROM tree t > UNION ALL > SELECT t.id, t.link, t.data, ROW(seq.depth + 1, id, link) > FROM tree t, search_tree st > WHERE t.id = st.link > ) > SELECT * FROM search_tree ORDER BY seq; > > The bit "seq.depth" isn't really valid when typed in like that, I > think, but of course internally this is all wired together with > numbers rather than identifiers. I suspect that that is what > ruleutils.c trips over. Thanks for your advice, it seems right. EXPLAIN VERBOSE can be output without error when I assigned testing purpose CoercionForm to 'seq.depth + 1'. I've attached the patch for the changes made for this test for your reference, but I'm not sure it's appropriate for creating a new CoercionForm to fix the issue.. -- Regards, -- Atsushi Torikoshi NTT DATA CORPORATION
Attachment
On 9/10/21 10:10 AM, torikoshia wrote: > On 2021-09-09 19:03, Peter Eisentraut wrote: >> On 07.09.21 20:31, Tom Lane wrote: >>> torikoshia <torikoshia@oss.nttdata.com> writes: >>>> While working on [1], we found that EXPLAIN(VERBOSE) to CTE with >>>> SEARCH >>>> BREADTH FIRST ends up ERROR. >>> >>> Yeah. It's failing here: >>> >>> * We're deparsing a Plan tree so we don't have >>> a CTE >>> * list. But the only place we'd see a Var >>> directly >>> * referencing a CTE RTE is in a CteScan plan >>> node, and we >>> * can look into the subplan's tlist instead. >>> >>> if (!dpns->inner_plan) >>> elog(ERROR, "failed to find plan for CTE %s", >>> rte->eref->aliasname); >>> >>> The problematic Var is *not* in a CteScan plan node; it's in a >>> WorkTableScan node. It's not clear to me whether this is a bug >>> in the planner's handling of SEARCH BREADTH FIRST, or if the plan >>> is as-intended and ruleutils.c is failing to cope. >> >> The search clause is resolved by the rewriter, so it's unlikely that >> the planner is doing something wrong. Either the rewriting produces >> something incorrect (but then one might expect that the query results >> would be wrong), or the structures constructed by rewriting are not >> easily handled by ruleutils.c. >> >> If we start from the example in the documentation >> <https://www.postgresql.org/docs/14/queries-with.html#QUERIES-WITH-RECURSIVE>: >> >> >> """ >> WITH RECURSIVE search_tree(id, link, data, depth) AS ( >> SELECT t.id, t.link, t.data, 0 >> FROM tree t >> UNION ALL >> SELECT t.id, t.link, t.data, depth + 1 >> FROM tree t, search_tree st >> WHERE t.id = st.link >> ) >> SELECT * FROM search_tree ORDER BY depth; >> >> To get a stable sort, add data columns as secondary sorting columns. >> """ >> >> In order to handle that part about the stable sort, the query >> constructed internally is something like >> >> WITH RECURSIVE search_tree(id, link, data, seq) AS ( >> SELECT t.id, t.link, t.data, ROW(0, id, link) >> FROM tree t >> UNION ALL >> SELECT t.id, t.link, t.data, ROW(seq.depth + 1, id, link) >> FROM tree t, search_tree st >> WHERE t.id = st.link >> ) >> SELECT * FROM search_tree ORDER BY seq; >> >> The bit "seq.depth" isn't really valid when typed in like that, I >> think, but of course internally this is all wired together with >> numbers rather than identifiers. I suspect that that is what >> ruleutils.c trips over. > > Thanks for your advice, it seems right. > > EXPLAIN VERBOSE can be output without error when I assigned testing > purpose CoercionForm to 'seq.depth + 1'. > > I've attached the patch for the changes made for this test for your > reference, but I'm not sure it's appropriate for creating a new > CoercionForm to fix the issue.. This is listed as an open item for release 14. Is it planned to commit the patch? If not, we should close the item. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Andrew Dunstan <andrew@dunslane.net> writes: > On 9/10/21 10:10 AM, torikoshia wrote: >> I've attached the patch for the changes made for this test for your >> reference, but I'm not sure it's appropriate for creating a new >> CoercionForm to fix the issue.. > This is listed as an open item for release 14. Is it planned to commit > the patch? If not, we should close the item. I do not think that patch is a proper solution, but we do need to do something about this. regards, tom lane
I wrote: > I do not think that patch is a proper solution, but we do need to do > something about this. I poked into this and decided it's an ancient omission within ruleutils.c. The reason we've not seen it before is probably that you can't get to the case through the parser. The SEARCH stuff is generating a query structure basically equivalent to regression=# with recursive cte (x,r) as ( select 42 as x, row(i, 2.3) as r from generate_series(1,3) i union all select x, row((c.r).f1, 4.5) from cte c ) select * from cte; ERROR: record type has not been registered and as you can see, expandRecordVariable fails to figure out what the referent of "c.r" is. I think that could be fixed (by looking into the non-recursive term), but given the lack of field demand, I'm not feeling that it's urgent. So the omission is pretty obvious from the misleading comment: actually, Vars referencing RTE_CTE RTEs can also appear in WorkTableScan nodes, and we're not doing anything to support that. But we only reach this code when trying to resolve a field of a Var of RECORD type, which is a case that it seems like the parser can't produce. It doesn't look too hard to fix: we just have to find the RecursiveUnion that goes with the WorkTableScan, and drill down into that, much as we would do in the CteScan case. See attached draft patch. I'm too tired to beat on this heavily or add a test case, but I have verified that it passes check-world and handles the example presented in this thread. regards, tom lane diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c index b932a83827..b15bd81b9c 100644 --- a/src/backend/utils/adt/ruleutils.c +++ b/src/backend/utils/adt/ruleutils.c @@ -375,6 +375,8 @@ static void identify_join_columns(JoinExpr *j, RangeTblEntry *jrte, deparse_columns *colinfo); static char *get_rtable_name(int rtindex, deparse_context *context); static void set_deparse_plan(deparse_namespace *dpns, Plan *plan); +static Plan *find_recursive_union(deparse_namespace *dpns, + WorkTableScan *wtscan); static void push_child_plan(deparse_namespace *dpns, Plan *plan, deparse_namespace *save_dpns); static void pop_child_plan(deparse_namespace *dpns, @@ -4866,6 +4868,9 @@ set_deparse_plan(deparse_namespace *dpns, Plan *plan) * For a SubqueryScan, pretend the subplan is INNER referent. (We don't * use OUTER because that could someday conflict with the normal meaning.) * Likewise, for a CteScan, pretend the subquery's plan is INNER referent. + * For a WorkTableScan, locate the parent RecursiveUnion plan node and use + * that as INNER referent. + * * For ON CONFLICT .. UPDATE we just need the inner tlist to point to the * excluded expression's tlist. (Similar to the SubqueryScan we don't want * to reuse OUTER, it's used for RETURNING in some modify table cases, @@ -4876,6 +4881,9 @@ set_deparse_plan(deparse_namespace *dpns, Plan *plan) else if (IsA(plan, CteScan)) dpns->inner_plan = list_nth(dpns->subplans, ((CteScan *) plan)->ctePlanId - 1); + else if (IsA(plan, WorkTableScan)) + dpns->inner_plan = find_recursive_union(dpns, + (WorkTableScan *) plan); else if (IsA(plan, ModifyTable)) dpns->inner_plan = plan; else @@ -4899,6 +4907,29 @@ set_deparse_plan(deparse_namespace *dpns, Plan *plan) dpns->index_tlist = NIL; } +/* + * Locate the ancestor plan node that is the RecursiveUnion generating + * the WorkTableScan's work table. We can match on wtParam, since that + * should be unique within the plan tree. + */ +static Plan * +find_recursive_union(deparse_namespace *dpns, WorkTableScan *wtscan) +{ + ListCell *lc; + + foreach(lc, dpns->ancestors) + { + Plan *ancestor = (Plan *) lfirst(lc); + + if (IsA(ancestor, RecursiveUnion) && + ((RecursiveUnion *) ancestor)->wtParam == wtscan->wtParam) + return ancestor; + } + elog(ERROR, "could not find RecursiveUnion for WorkTableScan with wtParam %d", + wtscan->wtParam); + return NULL; +} + /* * push_child_plan: temporarily transfer deparsing attention to a child plan * @@ -7646,9 +7677,12 @@ get_name_for_var_field(Var *var, int fieldno, { /* * We're deparsing a Plan tree so we don't have a CTE - * list. But the only place we'd see a Var directly - * referencing a CTE RTE is in a CteScan plan node, and we - * can look into the subplan's tlist instead. + * list. But the only places we'd see a Var directly + * referencing a CTE RTE are in CteScan or WorkTableScan + * plan nodes. For those cases, set_deparse_plan arranged + * for dpns->inner_plan to be the plan node that emits the + * CTE or RecursiveUnion result, and we can look at its + * tlist instead. */ TargetEntry *tle; deparse_namespace save_dpns;
On 9/15/21 7:40 PM, Tom Lane wrote: > I wrote: >> I do not think that patch is a proper solution, but we do need to do >> something about this. > I poked into this and decided it's an ancient omission within ruleutils.c. > The reason we've not seen it before is probably that you can't get to the > case through the parser. The SEARCH stuff is generating a query structure > basically equivalent to > > regression=# with recursive cte (x,r) as ( > select 42 as x, row(i, 2.3) as r from generate_series(1,3) i > union all > select x, row((c.r).f1, 4.5) from cte c > ) > select * from cte; > ERROR: record type has not been registered > > and as you can see, expandRecordVariable fails to figure out what > the referent of "c.r" is. I think that could be fixed (by looking > into the non-recursive term), but given the lack of field demand, > I'm not feeling that it's urgent. > > So the omission is pretty obvious from the misleading comment: > actually, Vars referencing RTE_CTE RTEs can also appear in WorkTableScan > nodes, and we're not doing anything to support that. But we only reach > this code when trying to resolve a field of a Var of RECORD type, which > is a case that it seems like the parser can't produce. > > It doesn't look too hard to fix: we just have to find the RecursiveUnion > that goes with the WorkTableScan, and drill down into that, much as we > would do in the CteScan case. See attached draft patch. I'm too tired > to beat on this heavily or add a test case, but I have verified that it > passes check-world and handles the example presented in this thread. > > Looks like a nice simple fix. Thanks for working on this. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
På torsdag 16. september 2021 kl. 01:40:31, skrev Tom Lane <tgl@sss.pgh.pa.us>:
[...]
regression=# with recursive cte (x,r) as (
select 42 as x, row(i, 2.3) as r from generate_series(1,3) i
union all
select x, row((c.r).f1, 4.5) from cte c
)
select * from cte;
ERROR: record type has not been registered
FWIW; I saw this Open Item was set to fixed, but I'm still getting this error in 388726753b638fb9938883bdd057b2ffe6f950f5
--
Andreas Joseph Krogh
Andreas Joseph Krogh
Andreas Joseph Krogh <andreas@visena.com> writes: > På torsdag 16. september 2021 kl. 01:40:31, skrev Tom Lane <tgl@sss.pgh.pa.us > <mailto:tgl@sss.pgh.pa.us>>: > [...] > regression=# with recursive cte (x,r) as ( > select 42 as x, row(i, 2.3) as r from generate_series(1,3) i > union all > select x, row((c.r).f1, 4.5) from cte c > ) > select * from cte; > ERROR: record type has not been registered > FWIW; I saw this Open Item was set to fixed, but I'm still getting this error > in 388726753b638fb9938883bdd057b2ffe6f950f5 The open item was not about that parser shortcoming, nor did this patch claim to fix it. regards, tom lane
På torsdag 16. september 2021 kl. 18:57:39, skrev Tom Lane <tgl@sss.pgh.pa.us>:
[...]
> FWIW; I saw this Open Item was set to fixed, but I'm still getting this error
> in 388726753b638fb9938883bdd057b2ffe6f950f5
The open item was not about that parser shortcoming, nor did this patch
claim to fix it.
regards, tom lane
Ok, sorry for the noise.
--
Andreas Joseph Krogh
Andreas Joseph Krogh
On 2021-09-16 08:40, Tom Lane wrote: > I wrote: >> I do not think that patch is a proper solution, but we do need to do >> something about this. > > I poked into this and decided it's an ancient omission within > ruleutils.c. > The reason we've not seen it before is probably that you can't get to > the > case through the parser. The SEARCH stuff is generating a query > structure > basically equivalent to > > regression=# with recursive cte (x,r) as ( > select 42 as x, row(i, 2.3) as r from generate_series(1,3) i > union all > select x, row((c.r).f1, 4.5) from cte c > ) > select * from cte; > ERROR: record type has not been registered > > and as you can see, expandRecordVariable fails to figure out what > the referent of "c.r" is. I think that could be fixed (by looking > into the non-recursive term), but given the lack of field demand, > I'm not feeling that it's urgent. > > So the omission is pretty obvious from the misleading comment: > actually, Vars referencing RTE_CTE RTEs can also appear in > WorkTableScan > nodes, and we're not doing anything to support that. But we only reach > this code when trying to resolve a field of a Var of RECORD type, which > is a case that it seems like the parser can't produce. > > It doesn't look too hard to fix: we just have to find the > RecursiveUnion > that goes with the WorkTableScan, and drill down into that, much as we > would do in the CteScan case. See attached draft patch. I'm too tired > to beat on this heavily or add a test case, but I have verified that it > passes check-world and handles the example presented in this thread. > > regards, tom lane Thanks for looking into this and fixing it! -- Regards, -- Atsushi Torikoshi NTT DATA CORPORATION
Hi! It seems like this patch causes another problem. If I explain a simple row generator **without** verbose, it fails: postgres=# EXPLAIN (VERBOSE FALSE) WITH RECURSIVE gen (n) AS ( VALUES (1) UNION ALL SELECT n+1 FROM gen WHERE n < 3 ) SELECT * FROM gen ; ERROR: could not find RecursiveUnion for WorkTableScan with wtParam 0 That’s the new error message introduced by the patch. The same with verbose works just fine: postgres=# EXPLAIN (VERBOSE TRUE) WITH RECURSIVE gen (n) AS ( VALUES (1) UNION ALL SELECT n+1 FROM gen WHERE n < 3 ) SELECT * FROM gen ; QUERY PLAN ----------------------------------------------------------------------------- CTE Scan on gen (cost=2.95..3.57 rows=31 width=4) Output: gen.n CTE gen -> Recursive Union (cost=0.00..2.95 rows=31 width=4) -> Result (cost=0.00..0.01 rows=1 width=4) Output: 1 -> WorkTable Scan on gen gen_1 (cost=0.00..0.23 rows=3 width=4) Output: (gen_1.n + 1) Filter: (gen_1.n < 3) (9 rows) Both variants work fine before that patch (4ac0f450b698442c3273ddfe8eed0e1a7e56645f). Markus Winand winand.at > On 21.09.2021, at 14:43, torikoshia <torikoshia@oss.nttdata.com> wrote: > > On 2021-09-16 08:40, Tom Lane wrote: >> I wrote: >>> I do not think that patch is a proper solution, but we do need to do >>> something about this. >> I poked into this and decided it's an ancient omission within ruleutils.c. >> The reason we've not seen it before is probably that you can't get to the >> case through the parser. The SEARCH stuff is generating a query structure >> basically equivalent to >> regression=# with recursive cte (x,r) as ( >> select 42 as x, row(i, 2.3) as r from generate_series(1,3) i >> union all >> select x, row((c.r).f1, 4.5) from cte c >> ) >> select * from cte; >> ERROR: record type has not been registered >> and as you can see, expandRecordVariable fails to figure out what >> the referent of "c.r" is. I think that could be fixed (by looking >> into the non-recursive term), but given the lack of field demand, >> I'm not feeling that it's urgent. >> So the omission is pretty obvious from the misleading comment: >> actually, Vars referencing RTE_CTE RTEs can also appear in WorkTableScan >> nodes, and we're not doing anything to support that. But we only reach >> this code when trying to resolve a field of a Var of RECORD type, which >> is a case that it seems like the parser can't produce. >> It doesn't look too hard to fix: we just have to find the RecursiveUnion >> that goes with the WorkTableScan, and drill down into that, much as we >> would do in the CteScan case. See attached draft patch. I'm too tired >> to beat on this heavily or add a test case, but I have verified that it >> passes check-world and handles the example presented in this thread. >> regards, tom lane > > Thanks for looking into this and fixing it! > > -- > Regards, > > -- > Atsushi Torikoshi > NTT DATA CORPORATION > > > >
On 11.10.21 12:22, Markus Winand wrote: > Both variants work fine before that patch (4ac0f450b698442c3273ddfe8eed0e1a7e56645f). That commit is a message wording patch. Are you sure you meant that one?
> On 11.10.2021, at 16:27, Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote: > > On 11.10.21 12:22, Markus Winand wrote: >> Both variants work fine before that patch (4ac0f450b698442c3273ddfe8eed0e1a7e56645f). > > That commit is a message wording patch. Are you sure you meant that one? > What I meant is that it was still working on 4ac0f450b698442c3273ddfe8eed0e1a7e56645f, but not on the next (3f50b82639637c9908afa2087de7588450aa866b). -markus
Markus Winand <markus.winand@winand.at> writes: > What I meant is that it was still working on 4ac0f450b698442c3273ddfe8eed0e1a7e56645f, but not on the next (3f50b82639637c9908afa2087de7588450aa866b). Yeah, silly oversight in that patch. Will push a fix shortly. regards, tom lane