Re: Inconsistent Behavior of GROUP BY ROLLUP in v17 vs master - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Inconsistent Behavior of GROUP BY ROLLUP in v17 vs master
Date
Msg-id 227430.1760969744@sss.pgh.pa.us
Whole thread Raw
In response to Re: Inconsistent Behavior of GROUP BY ROLLUP in v17 vs master  (Richard Guo <guofenglinux@gmail.com>)
Responses Re: Inconsistent Behavior of GROUP BY ROLLUP in v17 vs master
List pgsql-hackers
Richard Guo <guofenglinux@gmail.com> writes:
> On Sat, Oct 18, 2025 at 6:14 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> The proposed patch tries to close the hole by checking whether
>>> the condition is degenerate, but that feels subtly wrong to me:
>>> what we ought to check is whether there is any empty grouping set.

> v1 patch tries to close this gap by detecting degenerate HAVING
> clauses and preventing them from being pushed down.  While this may
> cause us to miss the optimization for degenerate clauses, I felt it
> was acceptable, because degenerate HAVING clauses don't seem to be
> common in practice, and detecting the presence of empty grouping sets
> doesn't seem a trivial task.

I actually agree with you that it's not that critical whether we fully
optimize degenerate HAVING clauses in every case.  (Although it might
be bad if we miss any cases that previous versions handled.)  What
bothered me about v1 was that it seemed to be operating from a
conceptually wrong model, which perhaps could lead to bugs in future.

> I took a look at has_empty_grouping_set() in v2 patch, but I'm afraid
> it's not thorough enough.  I don't think it's correct to return true
> once we find an EMPTY, ROLLUP, or CUBE GroupingSet.  I think we need
> to compute the Cartesian product across different GroupingSets, just
> like what expand_grouping_sets() does.

Oh, good point.  We could probably make it handle this with a little
more complication, or decide that it's okay if we fail to optimize
some cases --- but if we're going to do expand_grouping_sets() anyway,
we might as well rely on that.

> Rather than extending has_empty_grouping_set() to compute the
> Cartesian product across different GroupingSets, I wonder if we could
> move the call to expand_grouping_sets() from
> preprocess_grouping_sets() to just before the push-down-HAVING
> optimization.  This way, we could leverage the expanded flat list of
> grouping sets to accurately determine whether any empty grouping sets
> exist.

I like the concept here, but not so much the details.  Pulling
expand_grouping_sets out of preprocess_grouping_sets feels weird.
I guess it's all right given that preprocess_grouping_sets doesn't
manipulate the parse tree otherwise, but you didn't update the comment
for preprocess_grouping_sets.  Also it might be a good idea to have a
test case demonstrating why v2 wasn't good enough.

> I did a quick search and didn't find any code between the
> push-down-HAVING optimization and preprocess_grouping_sets() that
> relies on parse->groupingSets being a list of GroupingSet's, so this
> should be safe.

I think it's actually quite awful that we replace parse->groupingSets
with data of a totally different representation.  It would be better
to store the result of expand_grouping_sets into some new PlannerInfo
field, say root->preprocessed_groupingSets.  But we can't make a
change like that in v18 now, so this is just something for future
cleanup.

            regards, tom lane



pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: Speed up COPY FROM text/CSV parsing using SIMD
Next
From: Álvaro Herrera
Date:
Subject: Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue