Re: DISTINCT with btree skip scan - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: DISTINCT with btree skip scan
Date
Msg-id CAEepm=0n0rxGWQH13pr3qD4z5zFK=0RqnosjEaPGfcSq7EyxBA@mail.gmail.com
Whole thread Raw
In response to Re: DISTINCT with btree skip scan  (Thomas Munro <munro@ip9.org>)
Responses Re: DISTINCT with btree skip scan  (Thomas Munro <thomas.munro@enterprisedb.com>)
List pgsql-hackers
On Sat, Nov 1, 2014 at 10:35 AM, Thomas Munro <munro@ip9.org> wrote:
> I am definitely interested in collaborating on a series of patches to
> implement various kinds of skip-based plans as seen in other RDBMSs if
> others think it could be useful.  I see skip-based DISTINCT as a good
> place to start.  (I suspect the semi-related skip scan plan (for the
> case when your WHERE clause doesn't have a qual for the leading
> column(s)) is much harder to plan and execute and I also suspect it's
> less generally useful).

There's an interesting thread about merge joins with a skip
optimisation[1] ("leapfrog trie-join", though I haven't read the paper
yet).  That reminded me to go and rebase this experimental patch which
is somewhat related.  Here it is, now split into two parts: one patch
to add an amskip operation, and another to consider using it to
implement DISTINCT when it was already has an index only scan path on
an index that supports skipping.  As you can see it's still sketchy
experiment-grade code, but it now has a first attempt at costing.

postgres=# select count(a) from foo;
┌─────────┐
│  count  │
├─────────┤
│ 5000000 │
└─────────┘
(1 row)

Time: 493.501 ms
postgres=# select distinct a from foo;
┌───┐
│ a │
├───┤
│ 0 │
│ 1 │
│ 2 │
│ 3 │
│ 4 │
│ 5 │
│ 6 │
│ 7 │
│ 8 │
│ 9 │
└───┘
(10 rows)

Time: 0.516 ms
postgres=# explain select distinct a from foo;

┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                     QUERY PLAN
                                               │

├─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Index Only Scan for distinct values of leading 1 column(s) using
foo_pkey on foo  (cost=0.43..4.33 rows=10 width=4) │

└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(1 row)

Time: 0.378 ms

[1] https://www.postgresql.org/message-id/flat/CAFQtOwp4e1hU6gKi1R_nHgOLTzUJko8EcaC1FdAV0rQFo1ARzA%40mail.gmail.com

-- 
Thomas Munro
http://www.enterprisedb.com

Attachment

pgsql-hackers by date:

Previous
From: Amit Langote
Date:
Subject: Re: int2vector and btree indexes
Next
From: Haribabu Kommi
Date:
Subject: Re: New SQL counter statistics view (pg_stat_sql)