Re: Allow DISTINCT to use Incremental Sort - Mailing list pgsql-hackers

From Richard Guo
Subject Re: Allow DISTINCT to use Incremental Sort
Date
Msg-id CAMbWs49WXscbTRyQ_9tqFHDC+v7KrjMn8x4E-zWb-Wm_Qm-BLA@mail.gmail.com
Whole thread Raw
In response to Re: Allow DISTINCT to use Incremental Sort  (David Rowley <dgrowleyml@gmail.com>)
Responses Re: Allow DISTINCT to use Incremental Sort
List pgsql-hackers

On Tue, Jan 10, 2023 at 10:14 AM David Rowley <dgrowleyml@gmail.com> wrote:
>         /* For explicit-sort case, always use the more rigorous clause */
>         if (list_length(root->distinct_pathkeys) <
>             list_length(root->sort_pathkeys))
>         {
>             needed_pathkeys = root->sort_pathkeys;
>             /* Assert checks that parser didn't mess up... */
>             Assert(pathkeys_contained_in(root->distinct_pathkeys,
>                                          needed_pathkeys));
>         }
>         else
>             needed_pathkeys = root->distinct_pathkeys;
>
> I'm not sure if this is necessary, as AFAIU the parser should have
> ensured that the sortClause is always a prefix of distinctClause.

I think that's true for standard DISTINCT, but it's not for DISTINCT ON.

> In the patch this code has been removed.  I think we should also remove
> the related comments in create_final_distinct_paths.
>
>        * When we have DISTINCT ON, we must sort by the more rigorous of
>        * DISTINCT and ORDER BY, else it won't have the desired behavior.
> -      * Also, if we do have to do an explicit sort, we might as well use
> -      * the more rigorous ordering to avoid a second sort later.  (Note
> -      * that the parser will have ensured that one clause is a prefix of
> -      * the other.)

I'm not quite following what you think has changed here.
 
Sorry I didn't make myself clear.  I mean currently on HEAD in planner.c
from line 4847 to line 4857, we have the code to make sure we always use
the more rigorous clause for explicit-sort case.  I think this code is
not necessary, because we have already done that in line 4791 to 4796,
for both DISTINCT ON and standard DISTINCT.

In this patch that code (line 4847 to line 4857) has been removed, which
I agree with.  I just wondered if the related comment should be removed
too.  But now after a second thought, I think it's OK to keep that
comment, as it still holds true in the new patch.
 
I've attached an updated patch
 
The patch looks good to me.

Thanks
Richard

pgsql-hackers by date:

Previous
From: "Regina Obe"
Date:
Subject: RE: [PATCH] Support % wildcard in extension upgrade filenames
Next
From: "shiy.fnst@fujitsu.com"
Date:
Subject: RE: Fix pg_publication_tables to exclude generated columns