Re: BUG #19106: Potential regression with CTE materialization planning in Postgres 18 - Mailing list pgsql-bugs
| From | Tom Lane |
|---|---|
| Subject | Re: BUG #19106: Potential regression with CTE materialization planning in Postgres 18 |
| Date | |
| Msg-id | 1980046.1762808758@sss.pgh.pa.us Whole thread Raw |
| In response to | Re: BUG #19106: Potential regression with CTE materialization planning in Postgres 18 (Tom Lane <tgl@sss.pgh.pa.us>) |
| List | pgsql-bugs |
[ cc'ing Peter and Vik for possible input on SQL-standard question ]
I wrote:
> This worked up until commit b0cc0a71e, and since then it hits an
> assertion failure in check_agglevels_and_constraints(); or if you
> don't have asserts enabled then the planner gets confused, because
> the max() aggregate function is given the wrong agglevelsup.
> I need to think through what is the correct behavior for cross-CTE
> references like these. Sadly, this is too late for this week's
> releases ...
Here is a draft fix for this. What it basically decides is that
commit b0cc0a71e was in error to suppose that an outer CTE reference
should work like an outer Var reference. In this even-more-simplified
test case:
WITH a AS (
SELECT id FROM (VALUES (1), (2)) AS v(id)
),
b AS (
SELECT max((SELECT sum(id) FROM a)) AS agg
)
SELECT agg FROM b;
it's pretty obvious that the reference to "a" should not cause us to
act as though the max() aggregate belongs to the outer "WITH ...
SELECT agg FROM b" query level. However, if it doesn't belong there,
where does it belong?
The draft patch takes the approach of forcing agglevelsup to zero
anytime we find a CTE reference within an aggregate's arguments
(unless it is to a CTE defined inside those arguments). That
means such an aggregate always has its syntactic level and cannot
be treated as an outer aggregate even if it otherwise references
(only) outer Vars.
I don't have a huge amount of confidence in this being the final
answer; it seems like it might still be too simplistic. But I'm
not sure what to do instead. It's at least less likely to break
cases that worked before b0cc0a71e, since "it's level zero" is
pretty much the default answer anyway.
I looked at the SQL standard for possible guidance and found none:
they disallow subqueries altogether within aggregate arguments,
so they need not consider such cases. I am curious though whether
Peter or Vik know if the committee ever considered relaxing that
restriction, and if so whether they stopped to think about this
particular point.
regards, tom lane
diff --git a/src/backend/parser/parse_agg.c b/src/backend/parser/parse_agg.c
index 3254c83cc6c..5de07a2aa9a 100644
--- a/src/backend/parser/parse_agg.c
+++ b/src/backend/parser/parse_agg.c
@@ -795,9 +795,16 @@ check_agg_arguments_walker(Node *node,
if (IsA(node, RangeTblEntry))
{
/*
- * CTE references act similarly to Vars of the CTE's level. Without
- * this we might conclude that the Agg can be evaluated above the CTE,
- * leading to trouble.
+ * If we find a nonlocal CTE reference, force the Agg to be given
+ * level zero, i.e. make its semantic level match its syntactic level.
+ * Without this we might conclude that the Agg can be evaluated above
+ * the CTE (if it contains only Vars from above the CTE), leading to
+ * trouble. On the other hand, we shouldn't allow a CTE reference by
+ * itself to promote an Agg to be an outer Agg; that fails for
+ * cross-references to sibling CTEs. The best solution seems to be to
+ * force the Agg to level zero. (The SQL spec provides no guidance
+ * here; they disallow subqueries within aggregates, and therefore
+ * don't reach the question of what to do with CTE references.)
*/
RangeTblEntry *rte = (RangeTblEntry *) node;
@@ -810,9 +817,8 @@ check_agg_arguments_walker(Node *node,
/* ignore local CTEs of subqueries */
if (ctelevelsup >= 0)
{
- if (context->min_varlevel < 0 ||
- context->min_varlevel > ctelevelsup)
- context->min_varlevel = ctelevelsup;
+ /* nonlocal CTE, so force Agg to level zero */
+ context->min_varlevel = 0;
}
}
return false; /* allow range_table_walker to continue */
diff --git a/src/test/regress/expected/with.out b/src/test/regress/expected/with.out
index 86fdb85c6c5..bb45cb83c62 100644
--- a/src/test/regress/expected/with.out
+++ b/src/test/regress/expected/with.out
@@ -2331,6 +2331,34 @@ from int4_tbl i4;
-2147483647 | 1
(5 rows)
+--
+-- test for bug #19106: interaction of WITH with aggregates
+--
+-- the fix for #19055 was too aggressive and broke this case
+explain (verbose, costs off)
+with a as ( select id from (values (1), (2)) as v(id) ),
+ b as ( select max((select sum(id) from a)) as agg )
+select agg from b;
+ QUERY PLAN
+--------------------------------------------
+ Aggregate
+ Output: max((InitPlan expr_1).col1)
+ InitPlan expr_1
+ -> Aggregate
+ Output: sum("*VALUES*".column1)
+ -> Values Scan on "*VALUES*"
+ Output: "*VALUES*".column1
+ -> Result
+(8 rows)
+
+with a as ( select id from (values (1), (2)) as v(id) ),
+ b as ( select max((select sum(id) from a)) as agg )
+select agg from b;
+ agg
+-----
+ 3
+(1 row)
+
--
-- test for nested-recursive-WITH bug
--
diff --git a/src/test/regress/sql/with.sql b/src/test/regress/sql/with.sql
index d88d5abb91a..e199662edd3 100644
--- a/src/test/regress/sql/with.sql
+++ b/src/test/regress/sql/with.sql
@@ -1111,6 +1111,19 @@ select f1, (with cte1(x,y) as (select 1,2)
select count((select i4.f1 from cte1))) as ss
from int4_tbl i4;
+--
+-- test for bug #19106: interaction of WITH with aggregates
+--
+-- the fix for #19055 was too aggressive and broke this case
+explain (verbose, costs off)
+with a as ( select id from (values (1), (2)) as v(id) ),
+ b as ( select max((select sum(id) from a)) as agg )
+select agg from b;
+
+with a as ( select id from (values (1), (2)) as v(id) ),
+ b as ( select max((select sum(id) from a)) as agg )
+select agg from b;
+
--
-- test for nested-recursive-WITH bug
--
pgsql-bugs by date: