Re: Removing useless DISTINCT clauses - Mailing list pgsql-hackers

From Melanie Plageman
Subject Re: Removing useless DISTINCT clauses
Date
Msg-id 152160295559.12147.1676278544502314555.pgcf@coridan.postgresql.org
Whole thread Raw
In response to Re: [HACKERS] Removing useless DISTINCT clauses  (David Rowley <david.rowley@2ndquadrant.com>)
Responses Re: Removing useless DISTINCT clauses  (David Rowley <david.rowley@2ndquadrant.com>)
Re: Removing useless DISTINCT clauses  (Laurenz Albe <laurenz.albe@cybertec.at>)
List pgsql-hackers
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:       tested, passed
Spec compliant:           tested, passed
Documentation:            tested, passed

This is a review of the patch to remove useless DISTINCT columns from the DISTINCT clause
https://www.postgresql.org/message-id/CAKJS1f8UMJ137sRuVSnEMDDpa57Q71JuLZi4yLCFMekNYVYqaQ%40mail.gmail.com


Contents & Purpose
==================

This patch removes any additional columns in the DISTINCT clause when the
table's primary key columns are entirely present in the DISTINCT clause. This
optimization works because the PK columns functionally determine the other
columns in the relation. The patch includes regression test cases.

Initial Run
===========
The patch applies cleanly to HEAD. All tests which are run as part of the
`installcheck` target pass correctly (all existing tests pass, all new tests
pass with the patch applied and fail without it). All TAP tests pass. All tests
which are run as part of the `installcheck-world` target pass except one of the
Bloom contrib module tests in `contrib/bloom/sql/bloom.sql`,
 `CREATE INDEX bloomidx ON tst USING bloom (i, t) WITH (col1 = 3);`
which is currently also failing (crashing) for me on master, so it is unrelated to the
patch. The test cases seem to cover the new behavior.

Functionality
=============
Given that this optimization is safe for the GROUP BY clause (you can remove
functionally dependent attributes from the clause there as well), the logic
does seem to follow for DISTINCT. It seems semantically correct to do this. As
for the implementation, the author seems to have reasonably addressed concerns
expressed over the distinction between DISTINCT and DISTINCT ON. As far as I
can see, this should be fine.

Code Review
===========
For a small performance hit but an improvement in readability, the length check
can be moved from the individual group by and distinct clause checks into the
helper function

    if (list_length(parse->distinctClause) < 2)
      return;

    and

     if (list_length(parse->groupClause) < 2)
      return;

can be moved into `remove_functionally_dependent_clauses`


Comments/Documentation
=======================

The main helper function that is added `remove_functionally_dependent_clauses`
uses one style of comment--with the name of the function and then the rest of
the description indented which is found some places in the code, 
/*
 * remove_functionally_dependent_clauses
 *        Processes clauselist and removes any items which are deemed to be
 *        functionally dependent on other clauselist items.
 *
 * If any item from the list can be removed, then a new list is built which
 * does not contain the removed items.  If no item can be removed then the
 * original list is returned.
 */

while other helper functions in the same file use a different style--all lines
flush to the side with a space. I'm not sure which is preferred

/*
 * limit_needed - do we actually need a Limit plan node?
 *
 * If we have constant-zero OFFSET and constant-null LIMIT, we can skip adding
 * a Limit node.  This is worth checking for because "OFFSET 0" is a common
 * locution for an optimization fence.  (Because other places in the planner
 ...

it looks like the non-indented ones are from older commits (2013 vs 2016).

The list length change is totally optional, but I'll leave it as Waiting On Author in case the author wants to make
thatchange.
 

Overall, LGTM.

The new status of this patch is: Waiting on Author

pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: [HACKERS] taking stdbool.h into use
Next
From: Andres Freund
Date:
Subject: Re: [HACKERS] taking stdbool.h into use