> On Sun, Jun 07, 2020 at 06:51:22PM +1200, David Rowley wrote:
>
> > * one in create_distinct_paths as per current implementation
> >
> > with what seems to be similar content.
>
> I think we need to have UniqueKeys in RelOptInfo so we can describe
> what a relation is unique by. There's no point for example in
> creating skip scan paths for a relation that's already unique on
> whatever we might try to skip scan on. e.g someone does:
>
> SELECT DISTINCT unique_and_indexed_column FROM tab;
>
> Since there's a unique index on unique_and_indexed_column then we
> needn't try to create a skipscan path for it.
>
> However, the advantages of having UniqueKeys on the RelOptInfo goes a
> little deeper than that. We can make use of it anywhere where we
> currently do relation_has_unique_index_for() for. Plus we get what
> Andy wants and can skip useless DISTINCT operations when the result is
> already unique on the distinct clause. Sure we could carry all the
> relation's unique properties around in Paths, but that's not the right
> place. It's logically a property of the relation, not the path
> specifically. RelOptInfo is a good place to store the properties of
> relations.
>
> The idea of the meaning of uniquekeys within a path is that the path
> is specifically making those keys unique. We're not duplicating the
> RelOptInfo's uniquekeys there.
>
> If we have a table like:
>
> CREATE TABLE tab (
> a INT PRIMARY KEY,
> b INT NOT NULL
> );
>
> CREATE INDEX tab_b_idx ON tab (b);
>
> Then I'd expect a query such as: SELECT DISTINCT b FROM tab; to have
> the uniquekeys for tab's RelOptInfo set to {a}, and the seqscan and
> index scan paths uniquekey properties set to NULL, but the skipscan
> index path uniquekeys for tab_b_idx set to {b}. Then when we go
> create the distinct paths Andy's work will see that there's no
> RelOptInfo uniquekeys for the distinct clause, but the skip scan work
> will loop over the unique_pathlist and find that we have a skipscan
> path with the required uniquekeys, a.k.a {b}.
>
> Does that make sense?
Yes, from this point of view it makes sense. I've already posted the
first version of index skip scan based on this implementation [1]. There
could be rought edges, but overall I hope we're on the same page.
[1]: https://www.postgresql.org/message-id/flat/20200609102247.jdlatmfyeecg52fi%40localhost