Re: Use LIMIT instead of Unique for DISTINCT when all distinct pathkeys are redundant - Mailing list pgsql-hackers

From Richard Guo
Subject Re: Use LIMIT instead of Unique for DISTINCT when all distinct pathkeys are redundant
Date
Msg-id CAMbWs49zGrqG2dg_AcP6TWmCQ+3o4eJvWC533q3jJ5O=X3N=CA@mail.gmail.com
Whole thread Raw
In response to Re: Use LIMIT instead of Unique for DISTINCT when all distinct pathkeys are redundant  (David Rowley <dgrowleyml@gmail.com>)
Responses Re: Use LIMIT instead of Unique for DISTINCT when all distinct pathkeys are redundant  (David Rowley <dgrowleyml@gmail.com>)
List pgsql-hackers

On Thu, Oct 13, 2022 at 4:50 AM David Rowley <dgrowleyml@gmail.com> wrote:
The problem is that I'm only added the LimitPath to the
cheapest_total_path.  I think to make your case work we'd need to add
the LimitPath only in cases where the distinct_pathkeys are empty but
the sort_pathkeys are not and hasDistinctOn is true and the path has
pathkeys_contained_in(root->sort_pathkeys, path->pathkeys).  I think
that's doable, but it's become quite a bit more complex than the patch
I proposed. Maybe it's worth a 2nd effort for that part?
 
Currently in the patch the optimization is done before we check for
presorted paths or do the explicit sort of the cheapest path. How about
we move this optimization into the branch where we've found any
presorted paths?  Maybe something like:

--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -4780,11 +4780,24 @@ create_final_distinct_paths(PlannerInfo *root, RelOptInfo *input_rel,

  if (pathkeys_contained_in(needed_pathkeys, path->pathkeys))
  {
-     add_path(distinct_rel, (Path *)
-              create_upper_unique_path(root, distinct_rel,
-                                       path,
-                                       list_length(root->distinct_pathkeys),
-                                       numDistinctRows));
+     if (root->distinct_pathkeys == NIL)
+     {
+         Node       *limitCount = (Node *) makeConst(INT8OID, -1, InvalidOid,
+                                                     sizeof(int64),
+                                                     Int64GetDatum(1), false,
+                                                     FLOAT8PASSBYVAL);
+
+         add_path(distinct_rel, (Path *)
+                  create_limit_path(root, distinct_rel,
+                                    path, NULL, limitCount,
+                                    LIMIT_OPTION_COUNT, 0, 1));
+     }
+     else
+         add_path(distinct_rel, (Path *)
+                  create_upper_unique_path(root, distinct_rel,
+                                           path,
+                                           list_length(root->distinct_pathkeys),
+                                           numDistinctRows));

This again makes the code less 'straightforward', just to cover a more
uncommon case. I'm also not sure if it's worth doing.

Thanks
Richard

pgsql-hackers by date:

Previous
From: Richard Guo
Date:
Subject: Re: Use LIMIT instead of Unique for DISTINCT when all distinct pathkeys are redundant
Next
From: Amit Kapila
Date:
Subject: Re: Perform streaming logical transactions by background workers and parallel apply