Re: Remove useless GROUP BY columns considering unique index - Mailing list pgsql-hackers

From David Rowley
Subject Re: Remove useless GROUP BY columns considering unique index
Date
Msg-id CAApHDvqHW0wo8rFzrtMgYa=bkxjfKOxSjiz_QqO5najWHjoRRg@mail.gmail.com
Whole thread Raw
In response to Remove useless GROUP BY columns considering unique index  (Zhang Mingli <zmlpostgres@gmail.com>)
Responses Re: Remove useless GROUP BY columns considering unique index
List pgsql-hackers
On Sat, 30 Dec 2023 at 04:05, Zhang Mingli <zmlpostgres@gmail.com> wrote:
> So my patch make it easy: check unique index’s columns, it’s a valid candidate if all of that have NOT NULL
constraint.
> And we choose a best one who has the least column numbers in get_min_unique_not_null_attnos(), as the reason: less
columnsmean that more group by columns could be removed. 

This patch no longer applies.  We no longer catalogue NOT NULL
constraints, which this patch is coded to rely upon. (Likely it could
just look at pg_attribute.attnotnull instead)

Aside from that, I'm not sure about the logic to prefer removing
functionally dependent columns using the constraint with the least
columns.  If you had the PRIMARY KEY on two int columns and a unique
index on a 1MB varlena column, I think using the primary key would be
a better choice to remove functionally dependent columns of.  Maybe
it's ok to remove any functionally dependant columns on the primary
key first and try removing functionally dependent columns on unique
constraints and a 2nd step (or maybe only if none can be removed using
the PK?)

Also, why constraints rather than unique indexes?  You'd need to check
for expression indexes and likely ignore those due to no ability to
detect NOT NULL for expressions.

Also, looking at the patch to how you've coded
get_min_unique_not_null_attnos(), it looks like you're very optimistic
about that being a constraint that has columns mentioned in the GROUP
BY clause. It looks like it  won't work if the UNIQUE constraint with
the least columns gets no mention in the GROUP BY clause. That'll
cause performance regressions from what we have today when the primary
key is mentioned and columns can be removed using that but a UNIQUE
constraint exists which has no corresponding GROUP BY columns and
fewer unique columns than the PK. That's not good and it'll need to be
fixed.

Set to waiting on author.

David



pgsql-hackers by date:

Previous
From: Tatsuo Ishii
Date:
Subject: Re: Add RESPECT/IGNORE NULLS and FROM FIRST/LAST options
Next
From: Tom Lane
Date:
Subject: Re: Remove shadowed declaration warnings