Re: [PATCH] Erase the distinctClause if the result is unique by definition - Mailing list pgsql-hackers

From Andy Fan
Subject Re: [PATCH] Erase the distinctClause if the result is unique by definition
Date
Msg-id CAKU4AWoipseP+_QpgrbiVdD7WviEpWP0mKbdXFd-tfkxJofnMA@mail.gmail.com
Whole thread Raw
In response to Re: [PATCH] Erase the distinctClause if the result is unique by definition  (David Rowley <dgrowleyml@gmail.com>)
Responses Re: [PATCH] Erase the distinctClause if the result is unique by definition  (David Rowley <dgrowleyml@gmail.com>)
List pgsql-hackers


On Wed, Mar 11, 2020 at 6:49 AM David Rowley <dgrowleyml@gmail.com> wrote:
On Wed, 11 Mar 2020 at 02:50, Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> On Tue, Mar 10, 2020 at 1:49 PM Andy Fan <zhihui.fan1213@gmail.com> wrote:
> > In my current implementation, it calculates the uniqueness for each
> > BaseRel only, but in your way,  looks we need to calculate the
> > UniquePathKey for both BaseRel and JoinRel.   This makes more
> > difference for multi table join.
>
> I didn't understand this concern. I think, it would be better to do it
> for all kinds of relation types base, join etc. This way we are sure
> that one method works across the planner to eliminate the need for
> Distinct or grouping. If we just implement something for base
> relations right now and don't do that for joins, there is a chance
> that that method may not work for joins when we come to implement it.

Yeah, it seems to me that we're seeing more and more features that
require knowledge of uniqueness of a RelOptInfo. The skip scans patch
needs to know if a join will cause row duplication so it knows if the
skip scan path can be joined to without messing up the uniqueness of
the skip scan.  Adding more and more places that loop over the rel's
indexlist just does not seem the right way to do it, especially so
when you have to dissect the join rel down to its base rel components
to check which indexes there are. Having the knowledge on-hand at the
RelOptInfo level means we no longer have to look at indexes for unique
proofs.

> > Another concern is UniquePathKey
> > is designed for a general purpose,  we need to maintain it no matter
> > distinctClause/groupbyClause.
>
> This should be ok. The time spent in annotating a RelOptInfo about
> uniqueness is not going to be a lot. But doing so would help generic
> elimination of Distinct/Group/Unique operations. What is
> UniquePathKey; I didn't find this in your patch or in the code.

Possibly a misinterpretation. There is some overlap between this patch
and the skip scan patch, both would like to skip doing explicit work
to implement DISTINCT. Skip scans just go about it by adding new path
types that scan the index and only gathers up unique values.  In that
case, the RelOptInfo won't contain the unique keys, but the skip scan
path will. How I imagine both of these patches working together in
create_distinct_paths() is that we first check if the DISTINCT clause
is a subset of the a set of the RelOptInfo's unique keys (this patch),
else we check if there are any paths with uniquekeys that we can use
to perform a no-op on the DISTINCT clause (the skip scan patch), if
none of those apply, we create the required paths to uniquify the
results.

Now I am convinced that we should maintain UniquePath on RelOptInfo,
I would see how to work with "Index Skip Scan" patch.

pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: Improve search for missing parent downlinks in amcheck
Next
From: Laurenz Albe
Date:
Subject: Re: Berserk Autovacuum (let's save next Mandrill)