Thread: Apply the "LIMIT 1" optimization to partial DISTINCT

Apply the "LIMIT 1" optimization to partial DISTINCT

From
Richard Guo
Date:
In 5543677ec9 we introduced an optimization that uses Limit instead of
Unique to implement DISTINCT when all the DISTINCT pathkeys have been
marked as redundant.  I happened to notice that this optimization was
not applied to partial DISTINCT, which I think should be.  This can
improve plans in some cases, such as

-- on master
explain (costs off) select distinct four from tenk1 where four = 4;
                  QUERY PLAN
----------------------------------------------
 Limit
   ->  Gather
         Workers Planned: 4
         ->  Unique
               ->  Parallel Seq Scan on tenk1
                     Filter: (four = 4)
(6 rows)

-- patched
explain (costs off) select distinct four from tenk1 where four = 4;
                  QUERY PLAN
----------------------------------------------
 Limit
   ->  Gather
         Workers Planned: 4
         ->  Limit
               ->  Parallel Seq Scan on tenk1
                     Filter: (four = 4)
(6 rows)

Such queries might not be that common, but it's very cheap to apply this
optimization.

Attached is a patch for that.

Thanks
Richard
Attachment

Re: Apply the "LIMIT 1" optimization to partial DISTINCT

From
David Rowley
Date:
On Fri, 26 Jan 2024 at 20:42, Richard Guo <guofenglinux@gmail.com> wrote:
>
> In 5543677ec9 we introduced an optimization that uses Limit instead of
> Unique to implement DISTINCT when all the DISTINCT pathkeys have been
> marked as redundant.  I happened to notice that this optimization was
> not applied to partial DISTINCT, which I think should be.

It seems very likely that the parallel plan would only be chosen if
the planner estimated there'd just be 1 row before the distinct.
Otherwise, the non-partial path's LIMIT would come out so cheap that
it would be unlikely that the parallel plan would be picked.

I think your test case only chooses the parallel plan because you're
doing FROM tenk1 WHERE four=4.  And that column only contains values
0..3.

However, having said that. Parallel plans are often picked when there
is some highly selective qual as parallel_tuple_cost has to be applied
to fewer tuples for such plans, so probably this is worth doing.

David



Re: Apply the "LIMIT 1" optimization to partial DISTINCT

From
David Rowley
Date:
On Fri, 26 Jan 2024 at 21:14, David Rowley <dgrowleyml@gmail.com> wrote:
> However, having said that. Parallel plans are often picked when there
> is some highly selective qual as parallel_tuple_cost has to be applied
> to fewer tuples for such plans, so probably this is worth doing.

I was messing around with your test case and didn't manage to get any
plan that had any rows to use the partial path with the LIMIT.  I
ended up dropping the test that was checking the results were empty as
I didn't think it added much more value over the EXPLAIN output.

I pushed the result.

Thanks for working on this.

David



Re: Apply the "LIMIT 1" optimization to partial DISTINCT

From
Richard Guo
Date:

On Wed, Jan 31, 2024 at 12:26 PM David Rowley <dgrowleyml@gmail.com> wrote:
On Fri, 26 Jan 2024 at 21:14, David Rowley <dgrowleyml@gmail.com> wrote:
> However, having said that. Parallel plans are often picked when there
> is some highly selective qual as parallel_tuple_cost has to be applied
> to fewer tuples for such plans, so probably this is worth doing.

I was messing around with your test case and didn't manage to get any
plan that had any rows to use the partial path with the LIMIT.  I
ended up dropping the test that was checking the results were empty as
I didn't think it added much more value over the EXPLAIN output.

I pushed the result.

Thanks for pushing it!

Thanks
Richard