Re: Index Skip Scan - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: Index Skip Scan |
Date | |
Msg-id | 20200306144917.2esaoyttgimt3lgn@development Whole thread Raw |
In response to | Re: Index Skip Scan (David Rowley <dgrowleyml@gmail.com>) |
Responses |
Re: Index Skip Scan
|
List | pgsql-hackers |
On Wed, Mar 04, 2020 at 11:32:00AM +1300, David Rowley wrote: >On Tue, 18 Feb 2020 at 05:24, Dmitry Dolgov <9erthalion6@gmail.com> wrote: >> Here is something similar to what I had in mind. > >(changing to this email address for future emails) > >Hi, > >I've been looking over v32 of the patch and have a few comments >regarding the planner changes. > >I think the changes in create_distinct_paths() need more work. The >way I think this should work is that create_distinct_paths() gets to >know exactly nothing about what path types support the elimination of >duplicate values. The Path should carry the UniqueKeys so that can be >determined. In create_distinct_paths() you should just be able to make >use of those paths, which should already have been created when >creating index paths for the rel due to PlannerInfo's query_uniquekeys >having been set. > +1 to code this in a generic way, using query_uniquekeys (if possible) >The reason it must be done this way is that when the RelOptInfo that >we're performing the DISTINCT on is a joinrel, then we're not going to >see any IndexPaths in the RelOptInfo's pathlist. We'll have some sort >of Join path instead. I understand you're not yet supporting doing >this optimisation when joins are involved, but it should be coded in >such a way that it'll work when we do. (It's probably also a separate >question as to whether we should have this only work when there are no >joins. I don't think I personally object to it for stage 1, but >perhaps someone else might think differently.) > I don't follow. Can you elaborate more? AFAICS skip-scan is essentially a capability of an (index) AM. I don't see how we could ever do that for joinrels? We can do that at the scan level, below a join, but that's what this patch already supports, I think. When you say "supporting this optimisation" with joins, do you mean doing skip-scan for join inputs, or on top of the join? >For storing these new paths with UniqueKeys, I'm not sure exactly if >we can just add_path() such paths into the RelOptInfo's pathlist. >What we don't want to do is accidentally make use of paths which >eliminate duplicate values when we don't want that behaviour. If we >did store these paths in RelOptInfo->pathlist then we'd need to go and >modify a bunch of places to ignore such paths. set_cheapest() would >have to do something special for them too, which makes me think >pathlist is the incorrect place. Parallel query added >partial_pathlist, so perhaps we need unique_pathlist to make this >work. > Hmmm, good point. Do we actually produce incorrect plans with the current patch, using skip-scan path when we should not? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-hackers by date: