Re: Allow DISTINCT to use Incremental Sort - Mailing list pgsql-hackers

From Richard Guo
Subject Re: Allow DISTINCT to use Incremental Sort
Date
Msg-id CAMbWs4_7Jq7ZQRREEYtOec91pymB7kEYXJV9PcBF0ymKVMDQPQ@mail.gmail.com
Whole thread Raw
In response to Allow DISTINCT to use Incremental Sort  (David Rowley <dgrowleyml@gmail.com>)
Responses Re: Allow DISTINCT to use Incremental Sort
List pgsql-hackers

On Sat, Jan 7, 2023 at 5:47 PM David Rowley <dgrowleyml@gmail.com> wrote:
While working on the regression tests added in a14a58329, I noticed
that DISTINCT does not make use of Incremental Sort.  It'll only ever
do full sorts on the cheapest input path or make use of a path that's
already got the required pathkeys.  Also, I see that
create_final_distinct_paths() is a little quirky and if the cheapest
input path happens to be sorted, it'll add_path() the same path twice,
which seems like a bit of a waste of effort. That could happen if say
enable_seqscan is off or if a Merge Join is the cheapest join method.

Additionally, the parallel DISTINCT code looks like it should also get
the same treatment.  I see that I'd coded this to only add a unique
path atop of a presorted path and it never considers sorting the
cheapest partial path.  I've adjusted that in the attached and also
made it consider incremental sorting any path with presorted keys.

Please see the attached patch.
 
+1 for the changes.  A minor comment is that previously on HEAD for
SELECT DISTINCT case, if we have to do an explicit full sort atop the
cheapest path, we try to make sure to always use the more rigorous
ordering.

        /* For explicit-sort case, always use the more rigorous clause */
        if (list_length(root->distinct_pathkeys) <
            list_length(root->sort_pathkeys))
        {
            needed_pathkeys = root->sort_pathkeys;
            /* Assert checks that parser didn't mess up... */
            Assert(pathkeys_contained_in(root->distinct_pathkeys,
                                         needed_pathkeys));
        }
        else
            needed_pathkeys = root->distinct_pathkeys;

I'm not sure if this is necessary, as AFAIU the parser should have
ensured that the sortClause is always a prefix of distinctClause.

In the patch this code has been removed.  I think we should also remove
the related comments in create_final_distinct_paths.

       * When we have DISTINCT ON, we must sort by the more rigorous of
       * DISTINCT and ORDER BY, else it won't have the desired behavior.
-      * Also, if we do have to do an explicit sort, we might as well use
-      * the more rigorous ordering to avoid a second sort later.  (Note
-      * that the parser will have ensured that one clause is a prefix of
-      * the other.)

Also, the comment just above this one is outdated too.

      * First, if we have any adequately-presorted paths, just stick a
      * Unique node on those.  Then consider doing an explicit sort of the
      * cheapest input path and Unique'ing that.

The two-step workflow is what is the case on HEAD but not any more in
the patch.  And I think we should mention incremental sort on any paths
with presorted keys.

Thanks
Richard

pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Allow +group in pg_ident.conf
Next
From: Maxim Orlov
Date:
Subject: Re: XID formatting and SLRU refactorings (was: Add 64-bit XIDs into PostgreSQL 15)