Re: Index Skip Scan - Mailing list pgsql-hackers

From David Rowley
Subject Re: Index Skip Scan
Date
Msg-id CAApHDvrMRHm3BoW0ZHHAmZY91HVSSkhv1zKSPfQF0QfkN7NdpQ@mail.gmail.com
Whole thread Raw
In response to Re: Index Skip Scan  (Dmitry Dolgov <9erthalion6@gmail.com>)
Responses Re: Index Skip Scan  (Dmitry Dolgov <9erthalion6@gmail.com>)
Re: Index Skip Scan  (Andy Fan <zhihui.fan1213@gmail.com>)
List pgsql-hackers
On Mon, 9 Mar 2020 at 03:21, Dmitry Dolgov <9erthalion6@gmail.com> wrote:
>
> >
> > I've been looking over v32 of the patch and have a few comments
> > regarding the planner changes.
>
> Thanks for the commentaries!
>
> > I think the changes in create_distinct_paths() need more work.  The
> > way I think this should work is that create_distinct_paths() gets to
> > know exactly nothing about what path types support the elimination of
> > duplicate values.  The Path should carry the UniqueKeys so that can be
> > determined. In create_distinct_paths() you should just be able to make
> > use of those paths, which should already have been created when
> > creating index paths for the rel due to PlannerInfo's query_uniquekeys
> > having been set.
>
> Just for me to clarify. The idea is to "move" information about what
> path types support skipping into UniqueKeys (derived from PlannerInfo's
> query_uniquekeys), but other checks (e.g. if index am supports that)
> still perform in create_distinct_paths?

create_distinct_paths() shouldn't know any details specific to the
pathtype that it's using or considering using.  All the details should
just be in Path. e.g. uniquekeys, pathkeys, costs etc. There should be
no IsA(path, ...).  Please have a look over the details in my reply to
Tomas. I hope that reply has enough information in it, but please
reply there if I've missed something.

> > On Wed, Mar 04, 2020 at 11:32:00AM +1300, David Rowley wrote:
> > There's also some weird looking assumptions that an EquivalenceMember
> > can only be a Var in create_distinct_paths(). I think you're only
> > saved from crashing there because a ProjectionPath will be created
> > atop of the IndexPath to evaluate expressions, in which case you're
> > not seeing the IndexPath.  This results in the optimisation not
> > working in cases like:
> >
> > postgres=# create table t (a int); create index on t ((a+1)); explain
> > select distinct a+1 from t;
> > CREATE TABLE
> > CREATE INDEX
> >                         QUERY PLAN
> > -----------------------------------------------------------
> >  HashAggregate  (cost=48.25..50.75 rows=200 width=4)
> >    Group Key: (a + 1)
> >    ->  Seq Scan on t  (cost=0.00..41.88 rows=2550 width=4)
>
> Yes, I need to fix it.
>
> > Using unique paths as I mentioned above should see that fixed.
>
> I'm a bit confused about this statement, how exactly unique paths should
> fix this?

The path's uniquekeys would mention that it's unique on  (a+1). You'd
compare the uniquekeys of the path to the DISTINCT clause and see that
the uniquekeys are a subset of the DISTINCT clause therefore the
DISTINCT is a no-op. If that uniquekey path is cheaper than the
cheapest_total_path + <cost of uniquification method>, then you should
pick the unique path, otherwise use the cheapest_total_path and
uniquify that.

I think the UniqueKeys may need to be changed from using
EquivalenceClasses to use Exprs instead.



pgsql-hackers by date:

Previous
From: James Coleman
Date:
Subject: Re: pg_rewind docs correction
Next
From: James Coleman
Date:
Subject: Re: Allow to_date() and to_timestamp() to accept localized names