Remove GROUP BY columns that are functionally dependent on other columns.
If a GROUP BY clause includes all columns of a non-deferred primary key,
as well as other columns of the same relation, those other columns are
redundant and can be dropped from the grouping; the pkey is enough to
ensure that each row of the table corresponds to a separate group.
Getting rid of the excess columns will reduce the cost of the sorting or
hashing needed to implement GROUP BY, and can indeed remove the need for
a sort step altogether.
This seems worth testing for since many query authors are not aware of
the GROUP-BY-primary-key exception to the rule about queries not being
allowed to reference non-grouped-by columns in their targetlists or
HAVING clauses. Thus, redundant GROUP BY items are not uncommon. Also,
we can make the test pretty cheap in most queries where it won't help
by not looking up a rel's primary key until we've found that at least
two of its columns are in GROUP BY.
David Rowley, reviewed by Julien Rouhaud
Branch
------
master
Details
-------
http://git.postgresql.org/pg/commitdiff/d4c3a156cb46dcd1f9f97a8011bd94c544079bb5
Modified Files
--------------
src/backend/catalog/pg_constraint.c | 94 ++++++++++++++++++
src/backend/optimizer/plan/planner.c | 159 +++++++++++++++++++++++++++++++
src/include/catalog/pg_constraint_fn.h | 3 +
src/test/regress/expected/aggregates.out | 64 +++++++++++++
src/test/regress/expected/join.out | 10 +-
src/test/regress/sql/aggregates.sql | 31 ++++++
src/test/regress/sql/join.sql | 4 +-
7 files changed, 360 insertions(+), 5 deletions(-)