Re: [PATCH] Erase the distinctClause if the result is unique by definition - Mailing list pgsql-hackers

From Ashutosh Bapat
Subject Re: [PATCH] Erase the distinctClause if the result is unique by definition
Date
Msg-id CAExHW5tZ_JMfuBp-=SJUUvm4KUoEY_3rk204AmxozqQ_zKACPA@mail.gmail.com
Whole thread Raw
In response to Re: [PATCH] Erase the distinctClause if the result is unique by definition  (David Rowley <dgrowleyml@gmail.com>)
List pgsql-hackers
On Wed, Mar 11, 2020 at 4:19 AM David Rowley <dgrowleyml@gmail.com> wrote:
>
> On Wed, 11 Mar 2020 at 02:50, Ashutosh Bapat
> <ashutosh.bapat.oss@gmail.com> wrote:
> >
> > On Tue, Mar 10, 2020 at 1:49 PM Andy Fan <zhihui.fan1213@gmail.com> wrote:
> > > In my current implementation, it calculates the uniqueness for each
> > > BaseRel only, but in your way,  looks we need to calculate the
> > > UniquePathKey for both BaseRel and JoinRel.   This makes more
> > > difference for multi table join.
> >
> > I didn't understand this concern. I think, it would be better to do it
> > for all kinds of relation types base, join etc. This way we are sure
> > that one method works across the planner to eliminate the need for
> > Distinct or grouping. If we just implement something for base
> > relations right now and don't do that for joins, there is a chance
> > that that method may not work for joins when we come to implement it.
>
> Yeah, it seems to me that we're seeing more and more features that
> require knowledge of uniqueness of a RelOptInfo. The skip scans patch
> needs to know if a join will cause row duplication so it knows if the
> skip scan path can be joined to without messing up the uniqueness of
> the skip scan.  Adding more and more places that loop over the rel's
> indexlist just does not seem the right way to do it, especially so
> when you have to dissect the join rel down to its base rel components
> to check which indexes there are. Having the knowledge on-hand at the
> RelOptInfo level means we no longer have to look at indexes for unique
> proofs.

+1. Yep. When we break join down to the base relation, partitioned
relation pose another challenge that the partitioned relation may not
have an index on it per say but each partition may have it and the
index key happens to be part of the partition key. That case would be
easy to track through RelOptInfo instead of breaking a base rel down
into its child rels.

>
> > > Another concern is UniquePathKey
> > > is designed for a general purpose,  we need to maintain it no matter
> > > distinctClause/groupbyClause.
> >
> > This should be ok. The time spent in annotating a RelOptInfo about
> > uniqueness is not going to be a lot. But doing so would help generic
> > elimination of Distinct/Group/Unique operations. What is
> > UniquePathKey; I didn't find this in your patch or in the code.
>
> Possibly a misinterpretation. There is some overlap between this patch
> and the skip scan patch, both would like to skip doing explicit work
> to implement DISTINCT. Skip scans just go about it by adding new path
> types that scan the index and only gathers up unique values.  In that
> case, the RelOptInfo won't contain the unique keys, but the skip scan
> path will. How I imagine both of these patches working together in
> create_distinct_paths() is that we first check if the DISTINCT clause
> is a subset of the a set of the RelOptInfo's unique keys (this patch),
> else we check if there are any paths with uniquekeys that we can use
> to perform a no-op on the DISTINCT clause (the skip scan patch), if
> none of those apply, we create the required paths to uniquify the
> results.

Looks good to me. But I have not seen index skip patch.

-- 
Best Wishes,
Ashutosh Bapat



pgsql-hackers by date:

Previous
From: Ashutosh Bapat
Date:
Subject: Re: [PATCH] Erase the distinctClause if the result is unique by definition
Next
From: Tobias Stadler
Date:
Subject: User and Logical Replication