Home > mailing lists

Re: DISTINCT with btree skip scan - Mailing list pgsql-hackers

From	Thomas Munro
Subject	Re: DISTINCT with btree skip scan
Date	October 12, 2016 06:20:02
Msg-id	CAEepm=0n0rxGWQH13pr3qD4z5zFK=0RqnosjEaPGfcSq7EyxBA@mail.gmail.com Whole thread Raw
In response to	Re: DISTINCT with btree skip scan (Thomas Munro <munro@ip9.org>)
Responses	Re: DISTINCT with btree skip scan (Thomas Munro <thomas.munro@enterprisedb.com>)
List	pgsql-hackers

Tree view

On Sat, Nov 1, 2014 at 10:35 AM, Thomas Munro <munro@ip9.org> wrote:
> I am definitely interested in collaborating on a series of patches to
> implement various kinds of skip-based plans as seen in other RDBMSs if
> others think it could be useful.  I see skip-based DISTINCT as a good
> place to start.  (I suspect the semi-related skip scan plan (for the
> case when your WHERE clause doesn't have a qual for the leading
> column(s)) is much harder to plan and execute and I also suspect it's
> less generally useful).

There's an interesting thread about merge joins with a skip
optimisation[1] ("leapfrog trie-join", though I haven't read the paper
yet).  That reminded me to go and rebase this experimental patch which
is somewhat related.  Here it is, now split into two parts: one patch
to add an amskip operation, and another to consider using it to
implement DISTINCT when it was already has an index only scan path on
an index that supports skipping.  As you can see it's still sketchy
experiment-grade code, but it now has a first attempt at costing.

postgres=# select count(a) from foo;
┌─────────┐
│  count  │
├─────────┤
│ 5000000 │
└─────────┘
(1 row)

Time: 493.501 ms
postgres=# select distinct a from foo;
┌───┐
│ a │
├───┤
│ 0 │
│ 1 │
│ 2 │
│ 3 │
│ 4 │
│ 5 │
│ 6 │
│ 7 │
│ 8 │
│ 9 │
└───┘
(10 rows)

Time: 0.516 ms
postgres=# explain select distinct a from foo;

┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                     QUERY PLAN
                                               │

├─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Index Only Scan for distinct values of leading 1 column(s) using
foo_pkey on foo  (cost=0.43..4.33 rows=10 width=4) │

└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(1 row)

Time: 0.378 ms

[1] https://www.postgresql.org/message-id/flat/CAFQtOwp4e1hU6gKi1R_nHgOLTzUJko8EcaC1FdAV0rQFo1ARzA%40mail.gmail.com

-- 
Thomas Munro
http://www.enterprisedb.com

Attachment

pgsql-hackers by date:

From: Amit Langote
Date: 12 October 2016, 05:32:41
Subject: Re: int2vector and btree indexes

From: Haribabu Kommi
Date: 12 October 2016, 06:21:51
Subject: Re: New SQL counter statistics view (pg_stat_sql)

Re: DISTINCT with btree skip scan - Mailing list pgsql-hackers

Attachment

Previous

Next