pgsql: Consider opfamily and collation when removing redundant GROUP BY - Mailing list pgsql-committers

From Richard Guo
Subject pgsql: Consider opfamily and collation when removing redundant GROUP BY
Date
Msg-id E1wLCC1-000npr-2W@gemulon.postgresql.org
Whole thread
List pgsql-committers
Consider opfamily and collation when removing redundant GROUP BY columns

remove_useless_groupby_columns() uses a relation's unique indexes to
prove that some GROUP BY columns are functionally dependent on others,
and so can be dropped from the GROUP BY clause.  The match between
index columns and GROUP BY columns was done by attno alone, ignoring
two equality-relation issues.

A type may belong to multiple btree opfamilies whose notions of
equality differ.  The record type, for instance, has record_ops
(per-field equality) and record_image_ops (bytewise equality).  A
unique index under one opfamily does not prove uniqueness under the
equality used by GROUP BY when the SortGroupClause's eqop comes from a
different opfamily.

Likewise, since nondeterministic collations were introduced in PG 12,
two collations may disagree on equality, and a unique index under one
collation does not prove uniqueness under another.

In either case, rows that the index considers distinct can collapse
into a single GROUP BY group, taking ungrouped columns of differing
values with them, so the planner drops a column that is not in fact
functionally dependent and produces wrong results.

Fix by requiring, for each unique-index key column, that some GROUP BY
item on the same column has an eqop in the index's opfamily and a
collation that agrees on equality with the index's collation.  This
mirrors the combined check relation_has_unique_index_for() applies to
join clauses.

This is a v18 regression: commit bd10ec529 extended
remove_useless_groupby_columns() from primary-key constraints to
arbitrary unique indexes.  Before that, the function consulted only
primary keys, whose enforcement index is required by parse_utilcmd.c
to use the default opclass and the column's declared collation, so
neither mismatch could arise.  Back-patch to v18 only.

Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Ayush Tiwari <ayushtiwari.slg01@gmail.com>
Discussion: https://postgr.es/m/CAMbWs49t6uArWoTT-cHY+nhsi23nJJKcF9Xb9cYGzaZ9kNJ98g@mail.gmail.com
Backpatch-through: 18

Branch
------
REL_18_STABLE

Details
-------
https://git.postgresql.org/pg/commitdiff/5c214b58b0599e9900dc777b3e00ea7120e7e10d

Modified Files
--------------
src/backend/optimizer/plan/initsplan.c         | 67 +++++++++++++++++++++++---
src/test/regress/expected/aggregates.out       | 31 ++++++++++++
src/test/regress/expected/collate.icu.utf8.out | 32 ++++++++++++
src/test/regress/sql/aggregates.sql            | 17 +++++++
src/test/regress/sql/collate.icu.utf8.sql      | 20 ++++++++
src/tools/pgindent/typedefs.list               |  1 +
6 files changed, 160 insertions(+), 8 deletions(-)


pgsql-committers by date:

Previous
From: Richard Guo
Date:
Subject: pgsql: Fix HAVING-to-WHERE pushdown for simple-CASE form
Next
From: Etsuro Fujita
Date:
Subject: pgsql: postgres_fdw: Fix syntax error in fetch_attstats().