Home > mailing lists

DISTINCT with btree skip scan - Mailing list pgsql-hackers

From	Thomas Munro
Subject	DISTINCT with btree skip scan
Date	July 5, 2014 03:17:45
Msg-id	CADLWmXXbTSBxP-MzJuPAYSsL_2f0iPm5VWPbCvDbVvfX93FKkw@mail.gmail.com Whole thread Raw
Responses	Re: DISTINCT with btree skip scan Re: DISTINCT with btree skip scan
List	pgsql-hackers

Tree view

Hello

As an exercise I hacked up the simplest code I could think of that would demonstrate a faster DISTINCT based on skipping ahead to the next distinct value in an index-only scan. Please see the attached (extremely buggy) patch, and the example session below. (It's against my natural instinct to send such half-baked early hacking phase code to the list, but thought it would make sense to demo the concept and then seek advice, warnings, cease and desist notices etc before pressing on down that route!) I would be most grateful for any feedback you might have.

Best regards,

Thomas Munro

postgres=# create table foo (a int, b int, primary key (a, b));
CREATE TABLE
Time: 9.002 ms
postgres=# insert into foo select generate_series(1, 5000000) % 10, generate_series(1, 5000000);
INSERT 0 5000000
Time: 35183.444 ms
postgres=# analyze foo;
ANALYZE
Time: 493.168 ms

postgres=# set enable_hashagg = true;
SET
Time: 0.274 ms
postgres=# explain select distinct a from foo;
┌───────────────────────────────────────────────────────────────────┐
│ QUERY PLAN │
├───────────────────────────────────────────────────────────────────┤
│ HashAggregate (cost=84624.00..84624.10 rows=10 width=4) │
│ Group Key: a │
│ -> Seq Scan on foo (cost=0.00..72124.00 rows=5000000 width=4) │
│ Planning time: 0.065 ms │
└───────────────────────────────────────────────────────────────────┘
(4 rows)

Time: 0.500 ms
postgres=# select distinct a from foo;
┌───┐
│ a │
├───┤
│ 6 │
│ 8 │
│ 1 │
│ 2 │
│ 3 │
│ 4 │
│ 5 │
│ 9 │
│ 7 │
│ 0 │
└───┘
(10 rows)

Time: 2017.019 ms

postgres=# set enable_hashagg = false;
SET
Time: 0.302 ms
postgres=# explain select distinct a from foo;
┌─────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ QUERY PLAN │
├─────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Index Only Scan for distinct prefix 1 using foo_pkey on foo (cost=0.43..263354.20 rows=10 width=4) │
│ Planning time: 0.063 ms │
└─────────────────────────────────────────────────────────────────────────────────────────────────────┘
(2 rows)

Time: 0.443 ms
postgres=# select distinct a from foo;
┌───┐
│ a │
├───┤
│ 0 │
│ 1 │
│ 2 │
│ 3 │
│ 4 │
│ 5 │
│ 6 │
│ 7 │
│ 8 │
│ 9 │
└───┘
(10 rows)

Time: 0.565 ms

Attachment

distinct-skip.patch

pgsql-hackers by date:

From: Kouhei Kaigai
Date: 05 July 2014, 00:38:54
Subject: Re: RLS Design

From: Vik Fearing
Date: 05 July 2014, 04:04:21
Subject: Re: DISTINCT with btree skip scan

DISTINCT with btree skip scan - Mailing list pgsql-hackers

Attachment

Previous

Next