Re: Index Skip Scan - Mailing list pgsql-hackers
From | David Rowley |
---|---|
Subject | Re: Index Skip Scan |
Date | |
Msg-id | CAApHDvo2abVQOiZ6wf9ExhobvvhLGWZQha0uWRAMby1NOD1ccw@mail.gmail.com Whole thread Raw |
In response to | Re: Index Skip Scan (Dmitry Dolgov <9erthalion6@gmail.com>) |
Responses |
Re: Index Skip Scan
Re: Index Skip Scan |
List | pgsql-hackers |
On Tue, 18 Feb 2020 at 05:24, Dmitry Dolgov <9erthalion6@gmail.com> wrote: > Here is something similar to what I had in mind. (changing to this email address for future emails) Hi, I've been looking over v32 of the patch and have a few comments regarding the planner changes. I think the changes in create_distinct_paths() need more work. The way I think this should work is that create_distinct_paths() gets to know exactly nothing about what path types support the elimination of duplicate values. The Path should carry the UniqueKeys so that can be determined. In create_distinct_paths() you should just be able to make use of those paths, which should already have been created when creating index paths for the rel due to PlannerInfo's query_uniquekeys having been set. The reason it must be done this way is that when the RelOptInfo that we're performing the DISTINCT on is a joinrel, then we're not going to see any IndexPaths in the RelOptInfo's pathlist. We'll have some sort of Join path instead. I understand you're not yet supporting doing this optimisation when joins are involved, but it should be coded in such a way that it'll work when we do. (It's probably also a separate question as to whether we should have this only work when there are no joins. I don't think I personally object to it for stage 1, but perhaps someone else might think differently.) For storing these new paths with UniqueKeys, I'm not sure exactly if we can just add_path() such paths into the RelOptInfo's pathlist. What we don't want to do is accidentally make use of paths which eliminate duplicate values when we don't want that behaviour. If we did store these paths in RelOptInfo->pathlist then we'd need to go and modify a bunch of places to ignore such paths. set_cheapest() would have to do something special for them too, which makes me think pathlist is the incorrect place. Parallel query added partial_pathlist, so perhaps we need unique_pathlist to make this work. Also, should create_grouping_paths() be getting the same code? Jesper's UniqueKey patch seems to set query_uniquekeys when there's a GROUP BY with no aggregates. So it looks like he has intended that something like: SELECT x FROM t GROUP BY x; should work the same way as SELECT DISTINCT x FROM t; but the 0002 patch does not make this work. Has that just been overlooked? There's also some weird looking assumptions that an EquivalenceMember can only be a Var in create_distinct_paths(). I think you're only saved from crashing there because a ProjectionPath will be created atop of the IndexPath to evaluate expressions, in which case you're not seeing the IndexPath. This results in the optimisation not working in cases like: postgres=# create table t (a int); create index on t ((a+1)); explain select distinct a+1 from t; CREATE TABLE CREATE INDEX QUERY PLAN ----------------------------------------------------------- HashAggregate (cost=48.25..50.75 rows=200 width=4) Group Key: (a + 1) -> Seq Scan on t (cost=0.00..41.88 rows=2550 width=4) Using unique paths as I mentioned above should see that fixed. David
pgsql-hackers by date: