pgsql: Detect redundant GROUP BY columns using UNIQUE indexes - Mailing list pgsql-committers

From David Rowley
Subject pgsql: Detect redundant GROUP BY columns using UNIQUE indexes
Date
Msg-id E1tLYwr-002HWX-5J@gemulon.postgresql.org
Whole thread Raw
List pgsql-committers
Detect redundant GROUP BY columns using UNIQUE indexes

d4c3a156c added support that when the GROUP BY contained all of the
columns belonging to a relation's PRIMARY KEY, all other columns
belonging to that relation would be removed from the GROUP BY clause.
That's possible because all other columns are functionally dependent on
the PRIMARY KEY and those columns alone ensure the groups are distinct.

Here we expand on that optimization and allow it to work for any unique
indexes on the table rather than just the PRIMARY KEY index.  This
normally requires that all columns in the index are defined with NOT NULL,
however, we can relax that requirement when the index is defined with
NULLS NOT DISTINCT.

When there are multiple suitable indexes to allow columns to be removed,
we prefer the index with the least number of columns as this allows us
to remove the highest number of GROUP BY columns.  One day, we may want to
revisit that decision as it may make more sense to use the narrower set of
columns in terms of the width of the data types and stored/queried data.

This also adjusts the code to make use of RelOptInfo.indexlist rather
than looking up the catalog tables.

In passing, add another short-circuit path to allow bailing out earlier
in cases where it's certainly not possible to remove redundant GROUP BY
columns.  This early exit is now cheaper to do than when this code was
originally written as 00b41463c made it cheaper to check for empty
Bitmapsets.

Patch originally by Zhang Mingli and later worked on by jian he, but after
I (David) worked on it, there was very little of the original left.

Author: Zhang Mingli, jian he, David Rowley
Reviewed-by: jian he, Andrei Lepikhov
Discussion: https://postgr.es/m/327990c8-b9b2-4b0c-bffb-462249f82de0%40Spark

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/bd10ec529796a13670645e6acd640c6f290df020

Modified Files
--------------
src/backend/optimizer/plan/initsplan.c   | 120 +++++++++++++++++++++++++------
src/backend/optimizer/util/plancat.c     |   1 +
src/include/nodes/pathnodes.h            |   2 +
src/test/regress/expected/aggregates.out |  67 +++++++++++++++++
src/test/regress/sql/aggregates.sql      |  32 +++++++++
5 files changed, 200 insertions(+), 22 deletions(-)


pgsql-committers by date:

Previous
From: Richard Guo
Date:
Subject: pgsql: Improve the test case from 5668a857d
Next
From: Michael Paquier
Date:
Subject: pgsql: Adjust some comments about structure properties in pg_stat.h