Re: Allow DISTINCT to use Incremental Sort - Mailing list pgsql-hackers

From David Rowley
Subject Re: Allow DISTINCT to use Incremental Sort
Date
Msg-id CAApHDvqgejygRzqP1Scpdvor+aWye8qeyO3nA3YKBAeV7XZvGQ@mail.gmail.com
Whole thread Raw
In response to Re: Allow DISTINCT to use Incremental Sort  (Richard Guo <guofenglinux@gmail.com>)
Responses Re: Allow DISTINCT to use Incremental Sort
List pgsql-hackers
Thanks for having a look at this.

On Tue, 10 Jan 2023 at 02:28, Richard Guo <guofenglinux@gmail.com> wrote:
> +1 for the changes.  A minor comment is that previously on HEAD for
> SELECT DISTINCT case, if we have to do an explicit full sort atop the
> cheapest path, we try to make sure to always use the more rigorous
> ordering.

I'm not quite sure I follow what's changed here.  As far as I see it
the code still does what it did and uses the more rigorous sort.

postgres=# explain (costs off) select distinct on (relname) * from
pg_Class order by relname, oid;
            QUERY PLAN
----------------------------------
 Unique
   ->  Sort
         Sort Key: relname, oid
         ->  Seq Scan on pg_class

If it didn't, then there'd have been a Sort atop of the Unique to
ORDER BY relname,oid in the above.

Maybe you were looking at create_partial_distinct_paths()? We don't do
anything there for DISTINCT ON, although perhaps we could. Just not
for this patch.

>
>         /* For explicit-sort case, always use the more rigorous clause */
>         if (list_length(root->distinct_pathkeys) <
>             list_length(root->sort_pathkeys))
>         {
>             needed_pathkeys = root->sort_pathkeys;
>             /* Assert checks that parser didn't mess up... */
>             Assert(pathkeys_contained_in(root->distinct_pathkeys,
>                                          needed_pathkeys));
>         }
>         else
>             needed_pathkeys = root->distinct_pathkeys;
>
> I'm not sure if this is necessary, as AFAIU the parser should have
> ensured that the sortClause is always a prefix of distinctClause.

I think that's true for standard DISTINCT, but it's not for DISTINCT ON.

> In the patch this code has been removed.  I think we should also remove
> the related comments in create_final_distinct_paths.
>
>        * When we have DISTINCT ON, we must sort by the more rigorous of
>        * DISTINCT and ORDER BY, else it won't have the desired behavior.
> -      * Also, if we do have to do an explicit sort, we might as well use
> -      * the more rigorous ordering to avoid a second sort later.  (Note
> -      * that the parser will have ensured that one clause is a prefix of
> -      * the other.)

I'm not quite following what you think has changed here.

> Also, the comment just above this one is outdated too.
>
>       * First, if we have any adequately-presorted paths, just stick a
>       * Unique node on those.  Then consider doing an explicit sort of the
>       * cheapest input path and Unique'ing that.
>
> The two-step workflow is what is the case on HEAD but not any more in
> the patch.  And I think we should mention incremental sort on any paths
> with presorted keys.

Yeah, I agree, that comment should mention incremental sort.

I've attached an updated patch

David

Attachment

pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: refactoring relation extension and BufferAlloc(), faster COPY
Next
From: Julien Rouhaud
Date:
Subject: Re: Schema variables - new implementation for Postgres 15 (typo)