Re: Inconsistent Behavior of GROUP BY ROLLUP in v17 vs master - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: Inconsistent Behavior of GROUP BY ROLLUP in v17 vs master |
Date | |
Msg-id | 329840.1760977678@sss.pgh.pa.us Whole thread Raw |
In response to | Re: Inconsistent Behavior of GROUP BY ROLLUP in v17 vs master (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-hackers |
I wrote: > I like the concept here, but not so much the details. Pulling > expand_grouping_sets out of preprocess_grouping_sets feels weird. > I guess it's all right given that preprocess_grouping_sets doesn't > manipulate the parse tree otherwise, but you didn't update the comment > for preprocess_grouping_sets. Also it might be a good idea to have a > test case demonstrating why v2 wasn't good enough. Here's a v4 with some concrete suggestions for comment changes, plus the extra test case. (I did some tiny cosmetic fooling with preprocess_grouping_sets too, because the order of its initial operations seemed quite weird to me.) regards, tom lane diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c index 342d782c74b..c4fd646b999 100644 --- a/src/backend/optimizer/plan/planner.c +++ b/src/backend/optimizer/plan/planner.c @@ -1127,15 +1127,28 @@ subquery_planner(PlannerGlobal *glob, Query *parse, char *plan_name, if (parse->hasTargetSRFs) parse->hasTargetSRFs = expression_returns_set((Node *) parse->targetList); + /* + * If we have grouping sets, expand the groupingSets tree of this query to + * a flat list of grouping sets. We need to do this before optimizing + * HAVING, since we can't easily tell if there's an empty grouping set + * until we have this representation. + */ + if (parse->groupingSets) + { + parse->groupingSets = + expand_grouping_sets(parse->groupingSets, parse->groupDistinct, -1); + } + /* * In some cases we may want to transfer a HAVING clause into WHERE. We * cannot do so if the HAVING clause contains aggregates (obviously) or * volatile functions (since a HAVING clause is supposed to be executed - * only once per group). We also can't do this if there are any nonempty - * grouping sets and the clause references any columns that are nullable - * by the grouping sets; moving such a clause into WHERE would potentially - * change the results. (If there are only empty grouping sets, then the - * HAVING clause must be degenerate as discussed below.) + * only once per group). We also can't do this if there are any grouping + * sets and the clause references any columns that are nullable by the + * grouping sets; the nulled values of those columns are not available + * before the grouping step. (The test on groupClause might seem wrong, + * but it's okay: it's just an optimization to avoid running pull_varnos + * when there cannot be any Vars in the HAVING clause.) * * Also, it may be that the clause is so expensive to execute that we're * better off doing it only once per group, despite the loss of @@ -1145,19 +1158,19 @@ subquery_planner(PlannerGlobal *glob, Query *parse, char *plan_name, * clause into WHERE, in hopes of eliminating tuples before aggregation * instead of after. * - * If the query has explicit grouping then we can simply move such a + * If the query has no empty grouping set then we can simply move such a * clause into WHERE; any group that fails the clause will not be in the * output because none of its tuples will reach the grouping or - * aggregation stage. Otherwise we must have a degenerate (variable-free) - * HAVING clause, which we put in WHERE so that query_planner() can use it - * in a gating Result node, but also keep in HAVING to ensure that we - * don't emit a bogus aggregated row. (This could be done better, but it - * seems not worth optimizing.) + * aggregation stage. Otherwise we have to keep the clause in HAVING to + * ensure that we don't emit a bogus aggregated row. But then the HAVING + * clause must be degenerate (variable-free), so we can copy it into WHERE + * so that query_planner() can use it in a gating Result node. (This could + * be done better, but it seems not worth optimizing.) * * Note that a HAVING clause may contain expressions that are not fully * preprocessed. This can happen if these expressions are part of * grouping items. In such cases, they are replaced with GROUP Vars in - * the parser and then replaced back after we've done with expression + * the parser and then replaced back after we're done with expression * preprocessing on havingQual. This is not an issue if the clause * remains in HAVING, because these expressions will be matched to lower * target items in setrefs.c. However, if the clause is moved or copied @@ -1182,8 +1195,11 @@ subquery_planner(PlannerGlobal *glob, Query *parse, char *plan_name, /* keep it in HAVING */ newHaving = lappend(newHaving, havingclause); } - else if (parse->groupClause) + else if (parse->groupClause && + (parse->groupingSets == NIL || + (List *) linitial(parse->groupingSets) != NIL)) { + /* There is GROUP BY, but no empty grouping set */ Node *whereclause; /* Preprocess the HAVING clause fully */ @@ -1196,6 +1212,7 @@ subquery_planner(PlannerGlobal *glob, Query *parse, char *plan_name, } else { + /* There is an empty grouping set (perhaps implicitly) */ Node *whereclause; /* Preprocess the HAVING clause fully */ @@ -2181,10 +2198,13 @@ grouping_planner(PlannerInfo *root, double tuple_fraction, } /* - * Do preprocessing for groupingSets clause and related data. This handles the - * preliminary steps of expanding the grouping sets, organizing them into lists - * of rollups, and preparing annotations which will later be filled in with - * size estimates. + * Do preprocessing for groupingSets clause and related data. + * + * We expect that parse->groupingSets has already been expanded into a flat + * list of grouping sets (that is, just integer Lists of ressortgroupref + * numbers) by expand_grouping_sets(). This function handles the preliminary + * steps of organizing the grouping sets into lists of rollups, and preparing + * annotations which will later be filled in with size estimates. */ static grouping_sets_data * preprocess_grouping_sets(PlannerInfo *root) @@ -2195,19 +2215,18 @@ preprocess_grouping_sets(PlannerInfo *root) ListCell *lc_set; grouping_sets_data *gd = palloc0(sizeof(grouping_sets_data)); - parse->groupingSets = expand_grouping_sets(parse->groupingSets, parse->groupDistinct, -1); - - gd->any_hashable = false; - gd->unhashable_refs = NULL; - gd->unsortable_refs = NULL; - gd->unsortable_sets = NIL; - /* * We don't currently make any attempt to optimize the groupClause when * there are grouping sets, so just duplicate it in processed_groupClause. */ root->processed_groupClause = parse->groupClause; + /* Detect unhashable and unsortable grouping expressions */ + gd->any_hashable = false; + gd->unhashable_refs = NULL; + gd->unsortable_refs = NULL; + gd->unsortable_sets = NIL; + if (parse->groupClause) { ListCell *lc; diff --git a/src/test/regress/expected/groupingsets.out b/src/test/regress/expected/groupingsets.out index 991121545c5..8e354759da6 100644 --- a/src/test/regress/expected/groupingsets.out +++ b/src/test/regress/expected/groupingsets.out @@ -890,7 +890,8 @@ explain (costs off) -> Seq Scan on gstest2 (10 rows) --- test pushdown of HAVING clause that does not reference any columns that are nullable by grouping sets +-- test pushdown of non-degenerate HAVING clause that does not reference any +-- columns that are nullable by grouping sets explain (costs off) select a, b, count(*) from gstest2 group by grouping sets ((a, b), (a)) having a > 1 and b > 1; QUERY PLAN @@ -911,6 +912,65 @@ select a, b, count(*) from gstest2 group by grouping sets ((a, b), (a)) having a 2 | 2 | 1 (1 row) +explain (costs off) +select a, b, count(*) from gstest2 group by rollup(a), b having b > 1; + QUERY PLAN +--------------------------------- + GroupAggregate + Group Key: b, a + Group Key: b + -> Sort + Sort Key: b, a + -> Seq Scan on gstest2 + Filter: (b > 1) +(7 rows) + +select a, b, count(*) from gstest2 group by rollup(a), b having b > 1; + a | b | count +---+---+------- + 1 | 2 | 1 + 2 | 2 | 1 + | 2 | 2 +(3 rows) + +-- test pushdown of degenerate HAVING clause +explain (costs off) +select count(*) from gstest2 group by grouping sets (()) having false; + QUERY PLAN +----------------------------------- + Aggregate + Group Key: () + Filter: false + -> Result + Replaces: Scan on gstest2 + One-Time Filter: false +(6 rows) + +select count(*) from gstest2 group by grouping sets (()) having false; + count +------- +(0 rows) + +explain (costs off) +select a, count(*) from gstest2 group by grouping sets ((a), ()) having false; + QUERY PLAN +----------------------------------------- + GroupAggregate + Group Key: a + Group Key: () + Filter: false + -> Sort + Sort Key: a + -> Result + Replaces: Scan on gstest2 + One-Time Filter: false +(9 rows) + +select a, count(*) from gstest2 group by grouping sets ((a), ()) having false; + a | count +---+------- +(0 rows) + -- HAVING with GROUPING queries select ten, grouping(ten) from onek group by grouping sets(ten) having grouping(ten) >= 0 diff --git a/src/test/regress/sql/groupingsets.sql b/src/test/regress/sql/groupingsets.sql index 38d3cdd0fd8..74e0bddebf1 100644 --- a/src/test/regress/sql/groupingsets.sql +++ b/src/test/regress/sql/groupingsets.sql @@ -290,11 +290,25 @@ explain (costs off) select v.c, (select count(*) from gstest2 group by () having v.c) from (values (false),(true)) v(c) order by v.c; --- test pushdown of HAVING clause that does not reference any columns that are nullable by grouping sets +-- test pushdown of non-degenerate HAVING clause that does not reference any +-- columns that are nullable by grouping sets explain (costs off) select a, b, count(*) from gstest2 group by grouping sets ((a, b), (a)) having a > 1 and b > 1; select a, b, count(*) from gstest2 group by grouping sets ((a, b), (a)) having a > 1 and b > 1; +explain (costs off) +select a, b, count(*) from gstest2 group by rollup(a), b having b > 1; +select a, b, count(*) from gstest2 group by rollup(a), b having b > 1; + +-- test pushdown of degenerate HAVING clause +explain (costs off) +select count(*) from gstest2 group by grouping sets (()) having false; +select count(*) from gstest2 group by grouping sets (()) having false; + +explain (costs off) +select a, count(*) from gstest2 group by grouping sets ((a), ()) having false; +select a, count(*) from gstest2 group by grouping sets ((a), ()) having false; + -- HAVING with GROUPING queries select ten, grouping(ten) from onek group by grouping sets(ten) having grouping(ten) >= 0
pgsql-hackers by date: