Home > mailing lists

Re: DISTINCT with btree skip scan - Mailing list pgsql-hackers

From	Thomas Munro
Subject	Re: DISTINCT with btree skip scan
Date	November 24, 2016 00:19:43
Msg-id	CAEepm=2r5zqNfNvJ6fRoPX9K9nqtPqNUTKG_OucNeBMnaaB9gQ@mail.gmail.com Whole thread Raw
In response to	Re: DISTINCT with btree skip scan (Thomas Munro <thomas.munro@enterprisedb.com>)
Responses	Re: DISTINCT with btree skip scan Re: DISTINCT with btree skip scan
List	pgsql-hackers

Tree view

On Wed, Oct 12, 2016 at 4:19 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> Here it is, now split into two parts: one patch
> to add an amskip operation, and another to consider using it to
> implement DISTINCT when it was already has an index only scan path on
> an index that supports skipping.

Those patches add an amskip operation and then teach the planner and
executor how to do this, given a table foo (a, b) with an index on (a)
or (a, b):
 SELECT DISTINCT foo FROM a
 Index Only Scan for distinct values of (a) using foo_pkey on foo

In just the right circumstances that is vastly faster than scanning
every tuple and uniquifying.  I was thinking that the next step in
this experiment might be to introduce a kind of plan that can handle
queries where the index's leading column is not in the WHERE clause,
building on the above plan, and behaving conceptually a little bit
like a Nested Loop, like so:
 SELECT * FROM foo WHERE b = 42
 Index Skip Scan   ->  Index Only Scan for distinct values of (a) using foo_pkey on foo   ->  Index Only Scan using
foo_pkeyon foo         Index Cond: ((a = outer.a) AND (b = 42))

I suppose you could instead have a single node that knows how to
perform the whole scan itself so that you don't finish up searching
for each distinct value in both the inner and outer subplan, but it
occurred to me that building the plan out of Lego bricks like this
might generalise better to more complicated queries.  I suppose it
might also parallelise nicely with a Parallel Index Only Scan as the
outer plan.

Maybe something similar is possible for handling "GROUP BY a".

Worth pursuing?  Does amskip suck?  Does anyone have better ideas,
either for how to do the low level skip or the higher level Index Skip
Scan, or perhaps a completely different way of looking at this?

-- 
Thomas Munro
http://www.enterprisedb.com

pgsql-hackers by date:

From: "Maeldron T."
Date: 23 November 2016, 22:45:35
Subject: Re: Google Cloud Compute + FreeBSD + PostgreSQL = timecounter issue

From: Tom Lane
Date: 24 November 2016, 00:28:22
Subject: Re: Calculation of param_source_rels in add_paths_to_joinrel

Re: DISTINCT with btree skip scan - Mailing list pgsql-hackers

Previous

Next