Re: Inconsistent Behavior of GROUP BY ROLLUP in v17 vs master - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: Inconsistent Behavior of GROUP BY ROLLUP in v17 vs master |
Date | |
Msg-id | 227430.1760969744@sss.pgh.pa.us Whole thread Raw |
In response to | Re: Inconsistent Behavior of GROUP BY ROLLUP in v17 vs master (Richard Guo <guofenglinux@gmail.com>) |
Responses |
Re: Inconsistent Behavior of GROUP BY ROLLUP in v17 vs master
|
List | pgsql-hackers |
Richard Guo <guofenglinux@gmail.com> writes: > On Sat, Oct 18, 2025 at 6:14 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> The proposed patch tries to close the hole by checking whether >>> the condition is degenerate, but that feels subtly wrong to me: >>> what we ought to check is whether there is any empty grouping set. > v1 patch tries to close this gap by detecting degenerate HAVING > clauses and preventing them from being pushed down. While this may > cause us to miss the optimization for degenerate clauses, I felt it > was acceptable, because degenerate HAVING clauses don't seem to be > common in practice, and detecting the presence of empty grouping sets > doesn't seem a trivial task. I actually agree with you that it's not that critical whether we fully optimize degenerate HAVING clauses in every case. (Although it might be bad if we miss any cases that previous versions handled.) What bothered me about v1 was that it seemed to be operating from a conceptually wrong model, which perhaps could lead to bugs in future. > I took a look at has_empty_grouping_set() in v2 patch, but I'm afraid > it's not thorough enough. I don't think it's correct to return true > once we find an EMPTY, ROLLUP, or CUBE GroupingSet. I think we need > to compute the Cartesian product across different GroupingSets, just > like what expand_grouping_sets() does. Oh, good point. We could probably make it handle this with a little more complication, or decide that it's okay if we fail to optimize some cases --- but if we're going to do expand_grouping_sets() anyway, we might as well rely on that. > Rather than extending has_empty_grouping_set() to compute the > Cartesian product across different GroupingSets, I wonder if we could > move the call to expand_grouping_sets() from > preprocess_grouping_sets() to just before the push-down-HAVING > optimization. This way, we could leverage the expanded flat list of > grouping sets to accurately determine whether any empty grouping sets > exist. I like the concept here, but not so much the details. Pulling expand_grouping_sets out of preprocess_grouping_sets feels weird. I guess it's all right given that preprocess_grouping_sets doesn't manipulate the parse tree otherwise, but you didn't update the comment for preprocess_grouping_sets. Also it might be a good idea to have a test case demonstrating why v2 wasn't good enough. > I did a quick search and didn't find any code between the > push-down-HAVING optimization and preprocess_grouping_sets() that > relies on parse->groupingSets being a list of GroupingSet's, so this > should be safe. I think it's actually quite awful that we replace parse->groupingSets with data of a totally different representation. It would be better to store the result of expand_grouping_sets into some new PlannerInfo field, say root->preprocessed_groupingSets. But we can't make a change like that in v18 now, so this is just something for future cleanup. regards, tom lane
pgsql-hackers by date: