Home > mailing lists

Re: Use LIMIT instead of Unique for DISTINCT when all distinct pathkeys are redundant - Mailing list pgsql-hackers

From	Richard Guo
Subject	Re: Use LIMIT instead of Unique for DISTINCT when all distinct pathkeys are redundant
Date	October 12, 2022 15:13:11
Msg-id	CAMbWs4_0BqmfWnXKihOy3Z+C5pPvF7jQbY2SkTyR65UKgre2bA@mail.gmail.com Whole thread Raw
In response to	Use LIMIT instead of Unique for DISTINCT when all distinct pathkeys are redundant (David Rowley <dgrowleyml@gmail.com>)
Responses	Re: Use LIMIT instead of Unique for DISTINCT when all distinct pathkeys are redundant
List	pgsql-hackers

Tree view

On Wed, Oct 12, 2022 at 5:19 PM David Rowley <dgrowleyml@gmail.com> wrote:

When all the distinct pathkeys are redundant then there can only be,
at most, 1 single distinct value. There may be many rows with that
value, but we can remove those extra ones with a LIMIT 1 rather than
troubling over needlessly uniquifing them.

This might not be a hugely common case, but; 1) it is very cheap to
detect and 2) the speedups are likely to be *very* good.

With the attached we get:

regression=# explain (analyze, costs off, timing off) SELECT DISTINCT
four,1,2,3 FROM tenk1 WHERE four = 0;
QUERY PLAN
-------------------------------------------------
Limit (actual rows=1 loops=1)
-> Seq Scan on tenk1 (actual rows=1 loops=1)
Filter: (four = 0)
Planning Time: 0.215 ms
Execution Time: 0.071 ms

naturally, if we removed the WHERE four = 0, we can't optimise this
plan using this method.

I see no reason why this also can't work for DISTINCT ON too.

For DISTINCT ON, if all the distinct pathkeys are redundant but there
are available sort pathkeys, then for adequately-presorted paths I think
we can also apply this optimization, using a Limit 1 rather than Unique.

regression=# explain (analyze, costs off, timing off) select distinct on (four) * from tenk1 where four = 0 order by four, hundred desc;
QUERY PLAN
--------------------------------------------------------------------------------
Limit (actual rows=1 loops=1)
-> Index Scan Backward using tenk1_hundred on tenk1 (actual rows=1 loops=1)
Filter: (four = 0)
Rows Removed by Filter: 300
Planning Time: 0.165 ms
Execution Time: 0.458 ms
(6 rows)

Thanks
Richard

pgsql-hackers by date:

From: vignesh C
Date: 12 October 2022, 14:52:54
Subject: Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock

From: Alvaro Herrera
Date: 12 October 2022, 15:30:55
Subject: Re: make_ctags: use -I option to ignore pg_node_attr macro

Re: Use LIMIT instead of Unique for DISTINCT when all distinct pathkeys are redundant - Mailing list pgsql-hackers

Previous

Next