While working on the fix for the push-down-HAVING optimization, I
noticed $subject.
query_is_distinct_for() is intended to determine whether a query never
returns duplicates of the specified columns. For a query using
grouping sets, if there are no grouping expressions, we know there are
no non-empty grouping sets and potentially one or more empty grouping
sets. The idea is to check whether there is only one empty grouping
set, in which case the query would return only one row and thus
unique. Currently the check being performed is:
if (list_length(query->groupingSets) == 1 &&
((GroupingSet *) linitial(query->groupingSets))->kind ==
GROUPING_SET_EMPTY)
return true;
else
return false;
It seems to me that this check is not thorough enough. First, if the
DISTINCT clause is used on the GROUP BY, duplicate empty grouping sets
will be removed, leaving only one. However, the current code does not
check groupDistinct at all.
Second, if the GROUP BY clause is written in the form of "grouping
sets(())", the GroupingSet is actually a GROUPING_SET_SETS node
containing a single GROUPING_SET_EMPTY node, which still represents
only one empty grouping set. The current check does not account for
this structure.
Third, the check "list_length(query->groupingSets) == 1" is also
insufficient. It's possible to have multiple GroupingSet nodes and
still end up with a single empty grouping set, as long as each node
contains only one GROUPING_SET_EMPTY element, for example, in a query
like "group by grouping sets (()), grouping sets (())".
To fix this issue, I suggest first checking the groupDistinct flag,
which is straightforward and does not cost much. After that, there
seem to be two possible approaches to verify whether there is only one
empty grouping set.
1) In addition to the current check, we can iterate through
query->groupingSets and verify that each GroupingSet node contains
exactly one element. This should cover all possible structures that
represent a single empty grouping set.
2) Alternatively, we can perform expand_grouping_sets() and then check
whether the resulting expanded flat list contains only one element.
I'm somewhat inclined toward option #2, as it seems to be a more
principled approach. It could be argued that calling
expand_grouping_sets() can introduce planning overhead, but I don't
think this is a problem, because 1) we're dealing with a simple
grouping sets structure that contains only empty grouping sets, and 2)
since we have already checked groupDistinct, there's no need to
perform deduplication of grouping sets within expand_grouping_sets().
Therefore, I expect the overhead from calling expand_grouping_sets()
to be minimal.
Attached is a patch along the lines of option #2. The LCOV report
indicates that there is currently no test coverage for the "else if
(query->groupingSets)" branch in query_is_distinct_for(). This patch
also adds test cases to cover that branch.
Thoughts?
- Richard