Re: [HACKERS] Removing useless DISTINCT clauses - Mailing list pgsql-hackers

From Tom Lane
Subject Re: [HACKERS] Removing useless DISTINCT clauses
Date
Msg-id 9554.1515529605@sss.pgh.pa.us
Whole thread Raw
In response to Re: [HACKERS] Removing useless DISTINCT clauses  (David Rowley <david.rowley@2ndquadrant.com>)
Responses Re: [HACKERS] Removing useless DISTINCT clauses
Re: [HACKERS] Removing useless DISTINCT clauses
List pgsql-hackers
David Rowley <david.rowley@2ndquadrant.com> writes:
> [ remove_useless_distinct_clauses_v2.patch ]

This is a cute idea, but I'm troubled by a couple of points:

1. Once you don't have all the tlist items shown in DISTINCT, it really is
more like DISTINCT ON, seems like.  I am not sure it's a good idea to set
hasDistinctOn, because that engages some planner behaviors we probably
don't want, but I'm also not sure we can get away with just ignoring the
difference.  As an example, in allpaths.c there are assorted assumptions
that having a distinctClause but !hasDistinctOn means all output columns
of a subquery are listed in the distinctClause.

2. There's a comment in planner.c to the effect that

         * When we have DISTINCT ON, we must sort by the more rigorous of
         * DISTINCT and ORDER BY, else it won't have the desired behavior.
         * Also, if we do have to do an explicit sort, we might as well use
         * the more rigorous ordering to avoid a second sort later.  (Note
         * that the parser will have ensured that one clause is a prefix of
         * the other.)

Removing random elements of the distinctClause will break its
correspondence with the sortClause, with probably bad results.

I do not remember for sure at the moment, but it may be that this
correspondence is only important for the case of DISTINCT ON, in which
case we could dodge the problem by not applying the optimization unless
it's plain DISTINCT.  That doesn't help us with point 1 though.

BTW, my dictionary says it's "dependent" not "dependant".

            regards, tom lane


pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: [HACKERS] [PATCH] Assert that the correct locks are held whencalling PageGetLSN()
Next
From: Peter Eisentraut
Date:
Subject: Re: Add %r substitution for psql prompts to show recovery status