Re: Index Skip Scan - Mailing list pgsql-hackers

From David Rowley
Subject Re: Index Skip Scan
Date
Msg-id CAApHDvo2abVQOiZ6wf9ExhobvvhLGWZQha0uWRAMby1NOD1ccw@mail.gmail.com
Whole thread Raw
In response to Re: Index Skip Scan  (Dmitry Dolgov <9erthalion6@gmail.com>)
Responses Re: Index Skip Scan
Re: Index Skip Scan
List pgsql-hackers
On Tue, 18 Feb 2020 at 05:24, Dmitry Dolgov <9erthalion6@gmail.com> wrote:
> Here is something similar to what I had in mind.

(changing to this email address for future emails)

Hi,

I've been looking over v32 of the patch and have a few comments
regarding the planner changes.

I think the changes in create_distinct_paths() need more work.  The
way I think this should work is that create_distinct_paths() gets to
know exactly nothing about what path types support the elimination of
duplicate values.  The Path should carry the UniqueKeys so that can be
determined. In create_distinct_paths() you should just be able to make
use of those paths, which should already have been created when
creating index paths for the rel due to PlannerInfo's query_uniquekeys
having been set.

The reason it must be done this way is that when the RelOptInfo that
we're performing the DISTINCT on is a joinrel, then we're not going to
see any IndexPaths in the RelOptInfo's pathlist. We'll have some sort
of Join path instead.  I understand you're not yet supporting doing
this optimisation when joins are involved, but it should be coded in
such a way that it'll work when we do.  (It's probably also a separate
question as to whether we should have this only work when there are no
joins. I don't think I personally object to it for stage 1, but
perhaps someone else might think differently.)

For storing these new paths with UniqueKeys, I'm not sure exactly if
we can just add_path() such paths into the RelOptInfo's pathlist.
What we don't want to do is accidentally make use of paths which
eliminate duplicate values when we don't want that behaviour.  If we
did store these paths in RelOptInfo->pathlist then we'd need to go and
modify a bunch of places to ignore such paths. set_cheapest() would
have to do something special for them too, which makes me think
pathlist is the incorrect place.  Parallel query added
partial_pathlist, so perhaps we need unique_pathlist to make this
work.

Also, should create_grouping_paths() be getting the same code?
Jesper's UniqueKey patch seems to set query_uniquekeys when there's a
GROUP BY with no aggregates. So it looks like he has intended that
something like:

SELECT x FROM t GROUP BY x;

should work the same way as

SELECT DISTINCT x FROM t;

but the 0002 patch does not make this work.  Has that just been overlooked?

There's also some weird looking assumptions that an EquivalenceMember
can only be a Var in create_distinct_paths(). I think you're only
saved from crashing there because a ProjectionPath will be created
atop of the IndexPath to evaluate expressions, in which case you're
not seeing the IndexPath.  This results in the optimisation not
working in cases like:

postgres=# create table t (a int); create index on t ((a+1)); explain
select distinct a+1 from t;
CREATE TABLE
CREATE INDEX
                        QUERY PLAN
-----------------------------------------------------------
 HashAggregate  (cost=48.25..50.75 rows=200 width=4)
   Group Key: (a + 1)
   ->  Seq Scan on t  (cost=0.00..41.88 rows=2550 width=4)

Using unique paths as I mentioned above should see that fixed.

David



pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: PATCH: logical_work_mem and logical streaming of largein-progress transactions
Next
From: Kyotaro Horiguchi
Date:
Subject: Back-patching -Wno-format-truncation.