Re: Index Skip Scan - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Index Skip Scan
Date
Msg-id 20200306144917.2esaoyttgimt3lgn@development
Whole thread Raw
In response to Re: Index Skip Scan  (David Rowley <dgrowleyml@gmail.com>)
Responses Re: Index Skip Scan  (David Rowley <dgrowleyml@gmail.com>)
List pgsql-hackers
On Wed, Mar 04, 2020 at 11:32:00AM +1300, David Rowley wrote:
>On Tue, 18 Feb 2020 at 05:24, Dmitry Dolgov <9erthalion6@gmail.com> wrote:
>> Here is something similar to what I had in mind.
>
>(changing to this email address for future emails)
>
>Hi,
>
>I've been looking over v32 of the patch and have a few comments
>regarding the planner changes.
>
>I think the changes in create_distinct_paths() need more work.  The
>way I think this should work is that create_distinct_paths() gets to
>know exactly nothing about what path types support the elimination of
>duplicate values.  The Path should carry the UniqueKeys so that can be
>determined. In create_distinct_paths() you should just be able to make
>use of those paths, which should already have been created when
>creating index paths for the rel due to PlannerInfo's query_uniquekeys
>having been set.
>

+1 to code this in a generic way, using query_uniquekeys (if possible)

>The reason it must be done this way is that when the RelOptInfo that
>we're performing the DISTINCT on is a joinrel, then we're not going to
>see any IndexPaths in the RelOptInfo's pathlist. We'll have some sort
>of Join path instead.  I understand you're not yet supporting doing
>this optimisation when joins are involved, but it should be coded in
>such a way that it'll work when we do.  (It's probably also a separate
>question as to whether we should have this only work when there are no
>joins. I don't think I personally object to it for stage 1, but
>perhaps someone else might think differently.)
>

I don't follow. Can you elaborate more?

AFAICS skip-scan is essentially a capability of an (index) AM. I don't
see how we could ever do that for joinrels? We can do that at the scan
level, below a join, but that's what this patch already supports, I
think. When you say "supporting this optimisation" with joins, do you
mean doing skip-scan for join inputs, or on top of the join?

>For storing these new paths with UniqueKeys, I'm not sure exactly if
>we can just add_path() such paths into the RelOptInfo's pathlist.
>What we don't want to do is accidentally make use of paths which
>eliminate duplicate values when we don't want that behaviour.  If we
>did store these paths in RelOptInfo->pathlist then we'd need to go and
>modify a bunch of places to ignore such paths. set_cheapest() would
>have to do something special for them too, which makes me think
>pathlist is the incorrect place.  Parallel query added
>partial_pathlist, so perhaps we need unique_pathlist to make this
>work.
>

Hmmm, good point. Do we actually produce incorrect plans with the
current patch, using skip-scan path when we should not?


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: David Steele
Date:
Subject: Re: More tests to stress directly checksum_impl.h
Next
From: Jeremy Finzel
Date:
Subject: Re: PHJ file leak.