Home > mailing lists

Re: [PATCH] Erase the distinctClause if the result is unique by definition - Mailing list pgsql-hackers

From	Andy Fan
Subject	Re: [PATCH] Erase the distinctClause if the result is unique by definition
Date	March 11, 2020 04:23:30
Msg-id	CAKU4AWoipseP+_QpgrbiVdD7WviEpWP0mKbdXFd-tfkxJofnMA@mail.gmail.com Whole thread Raw
In response to	Re: [PATCH] Erase the distinctClause if the result is unique by definition (David Rowley <dgrowleyml@gmail.com>)
Responses	Re: [PATCH] Erase the distinctClause if the result is unique by definition
List	pgsql-hackers

Tree view

On Wed, Mar 11, 2020 at 6:49 AM David Rowley <dgrowleyml@gmail.com> wrote:

On Wed, 11 Mar 2020 at 02:50, Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> On Tue, Mar 10, 2020 at 1:49 PM Andy Fan <zhihui.fan1213@gmail.com> wrote:
> > In my current implementation, it calculates the uniqueness for each
> > BaseRel only, but in your way, looks we need to calculate the
> > UniquePathKey for both BaseRel and JoinRel. This makes more
> > difference for multi table join.
>
> I didn't understand this concern. I think, it would be better to do it
> for all kinds of relation types base, join etc. This way we are sure
> that one method works across the planner to eliminate the need for
> Distinct or grouping. If we just implement something for base
> relations right now and don't do that for joins, there is a chance
> that that method may not work for joins when we come to implement it.

Yeah, it seems to me that we're seeing more and more features that
require knowledge of uniqueness of a RelOptInfo. The skip scans patch
needs to know if a join will cause row duplication so it knows if the
skip scan path can be joined to without messing up the uniqueness of
the skip scan. Adding more and more places that loop over the rel's
indexlist just does not seem the right way to do it, especially so
when you have to dissect the join rel down to its base rel components
to check which indexes there are. Having the knowledge on-hand at the
RelOptInfo level means we no longer have to look at indexes for unique
proofs.

> > Another concern is UniquePathKey
> > is designed for a general purpose, we need to maintain it no matter
> > distinctClause/groupbyClause.
>
> This should be ok. The time spent in annotating a RelOptInfo about
> uniqueness is not going to be a lot. But doing so would help generic
> elimination of Distinct/Group/Unique operations. What is
> UniquePathKey; I didn't find this in your patch or in the code.

Possibly a misinterpretation. There is some overlap between this patch
and the skip scan patch, both would like to skip doing explicit work
to implement DISTINCT. Skip scans just go about it by adding new path
types that scan the index and only gathers up unique values. In that
case, the RelOptInfo won't contain the unique keys, but the skip scan
path will. How I imagine both of these patches working together in
create_distinct_paths() is that we first check if the DISTINCT clause
is a subset of the a set of the RelOptInfo's unique keys (this patch),
else we check if there are any paths with uniquekeys that we can use
to perform a no-op on the DISTINCT clause (the skip scan patch), if
none of those apply, we create the required paths to uniquify the
results.

Now I am convinced that we should maintain UniquePath on RelOptInfo,

I would see how to work with "Index Skip Scan" patch.

pgsql-hackers by date:

From: Peter Geoghegan
Date: 11 March 2020, 04:19:31
Subject: Re: Improve search for missing parent downlinks in amcheck

From: Laurenz Albe
Date: 11 March 2020, 04:24:28
Subject: Re: Berserk Autovacuum (let's save next Mandrill)

Re: [PATCH] Erase the distinctClause if the result is unique by definition - Mailing list pgsql-hackers

Previous

Next