Re: [PATCH] Keeps tracking the uniqueness with UniqueKey - Mailing list pgsql-hackers

From Dmitry Dolgov
Subject Re: [PATCH] Keeps tracking the uniqueness with UniqueKey
Date
Msg-id 20200609102913.jydqirxee5fdybhx@localhost
Whole thread Raw
In response to Re: [PATCH] Keeps tracking the uniqueness with UniqueKey  (David Rowley <dgrowleyml@gmail.com>)
Responses Re: [PATCH] Keeps tracking the uniqueness with UniqueKey
List pgsql-hackers
> On Sun, Jun 07, 2020 at 06:51:22PM +1200, David Rowley wrote:
>
> > * one in create_distinct_paths as per current implementation
> >
> > with what seems to be similar content.
>
> I think we need to have UniqueKeys in RelOptInfo so we can describe
> what a relation is unique by.  There's no point for example in
> creating skip scan paths for a relation that's already unique on
> whatever we might try to skip scan on. e.g someone does:
>
> SELECT DISTINCT unique_and_indexed_column FROM tab;
>
> Since there's a unique index on unique_and_indexed_column then we
> needn't try to create a skipscan path for it.
>
> However, the advantages of having UniqueKeys on the RelOptInfo goes a
> little deeper than that.  We can make use of it anywhere where we
> currently do relation_has_unique_index_for() for. Plus we get what
> Andy wants and can skip useless DISTINCT operations when the result is
> already unique on the distinct clause.  Sure we could carry all the
> relation's unique properties around in Paths, but that's not the right
> place. It's logically a property of the relation, not the path
> specifically.  RelOptInfo is a good place to store the properties of
> relations.
>
> The idea of the meaning of uniquekeys within a path is that the path
> is specifically making those keys unique.  We're not duplicating the
> RelOptInfo's uniquekeys there.
>
> If we have a table like:
>
> CREATE TABLE tab (
>    a INT PRIMARY KEY,
>    b INT NOT NULL
> );
>
> CREATE INDEX tab_b_idx ON tab (b);
>
> Then I'd expect a query such as: SELECT DISTINCT b FROM tab; to have
> the uniquekeys for tab's RelOptInfo set to {a}, and the seqscan and
> index scan paths uniquekey properties set to NULL, but the skipscan
> index path uniquekeys for tab_b_idx set to {b}.  Then when we go
> create the distinct paths Andy's work will see that there's no
> RelOptInfo uniquekeys for the distinct clause, but the skip scan work
> will loop over the unique_pathlist and find that we have a skipscan
> path with the required uniquekeys, a.k.a {b}.
>
> Does that make sense?

Yes, from this point of view it makes sense. I've already posted the
first version of index skip scan based on this implementation [1]. There
could be rought edges, but overall I hope we're on the same page.

[1]: https://www.postgresql.org/message-id/flat/20200609102247.jdlatmfyeecg52fi%40localhost



pgsql-hackers by date:

Previous
From: ilmari@ilmari.org (Dagfinn Ilmari Mannsåker)
Date:
Subject: Re: TAP tests and symlinks on Windows
Next
From: Georgios
Date:
Subject: Include access method in listTables output