Re: Removing useless DISTINCT clauses - Mailing list pgsql-hackers

From Jim Finnerty
Subject Re: Removing useless DISTINCT clauses
Date
Msg-id 1534968715834-0.post@n3.nabble.com
Whole thread Raw
In response to Re: Removing useless DISTINCT clauses  (David Rowley <david.rowley@2ndquadrant.com>)
Responses Re: Removing useless DISTINCT clauses
List pgsql-hackers
distinct_optimization_v6.patch
<http://www.postgresql-archive.org/file/t348990/distinct_optimization_v6.patch>  

Here is an update to this thread, for potential inclusion in v12.  I
couldn't get the most recent 'v5' patch to apply cleanly, so I recreated a
v6 patch on PG10.5 by hand, and made a few changes and improvements:

-  (a) If there is only one relation and the PK is present in the SELECT
list, then the distinctClause can be removed.  Thus, a query of the form
SELECT DISTINCT pk, * FROM t; recognizes that the body of the SELECT is
already distinct on pk, and doesn't do an unnecessary DISTINCT operation
-  (b) If there is only one relation, the PK is present, and there are no
aggregates, then the groupByClause can similarly be removed
-  (c) If DISTINCT ON is specified, but no ORDER BY is specified, then it
acts like a regular DISTINCT
-  (d) If the distinct clause is modified, this fact is recorded in a new
bool in the Query struct so that when the distinct was modified,
create_distinct_paths no longer Asserts that the needed pathkeys are
contained in the distinct_pathkeys.
-  (e) Since the new bool is in the Query struct,
copyfuncs/equalfuncs/outfuncs/readfuncs support is also provided.  This will
cause a problem unless the pg_rewrite catalog is regenerated.  

The pg_rewrite catalog contains a serialized representation of the Query
node in its ev_action column.  If there is a way to recreate the contents of
the pg_rewrite relation without bumping the catversion, can someone please
explain how?  If not, then this change is incomplete and would require a new
catalog version (catversion.h) too.

Additional work on this patch would be desirable.  It should check for
unique + not null, in addition to just the pk constraint.  The DISTINCT
could be eliminated in cases with multiple relations if all the joins are
1:1, although that would arguably belong in a different patch.

    /Jim

p.s. the v6 patch works for the problem case that Tom Lane reported with the
v5 patch




--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html


pgsql-hackers by date:

Previous
From: "Daniel Verite"
Date:
Subject: Re: csv format for psql
Next
From: Nikita Glukhov
Date:
Subject: Re: jsonpath