Home > mailing lists
Re: "could not find pathkey item to sort" for TPC-DS queries 94-96 - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: "could not find pathkey item to sort" for TPC-DS queries 94-96
Date	April 17, 2021 22:39:45
Msg-id	355670.1618688385@sss.pgh.pa.us Whole thread Raw
In response to	Re: "could not find pathkey item to sort" for TPC-DS queries 94-96 (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Responses	Re: "could not find pathkey item to sort" for TPC-DS queries 94-96 (Tom Lane <tgl@sss.pgh.pa.us>) Re: "could not find pathkey item to sort" for TPC-DS queries 94-96 (James Coleman <jtc331@gmail.com>)
List	pgsql-hackers
Tree view
[ sorry for not getting to this thread till now ]

Tomas Vondra <tomas.vondra@enterprisedb.com> writes:
> 3) Shouldn't find_em_expr_usable_for_sorting_rel now mostly mimic what
> prepare_sort_from_pathkeys does? That is, try to match the entries
> directly first, before the new pull_vars() business?

Yeah.  I concur that the problem here is that
find_em_expr_usable_for_sorting_rel isn't fully accounting for what
prepare_sort_from_pathkeys can and can't do.  However, I don't like this
patch much:

* As written, I think it may just move the pain somewhere else.  The point
of the logic in prepare_sort_from_pathkeys is to handle either full
expression matches (e.g. sort by "A+B" when "A+B" is an expression in
the input tlist) or computable expressions (sort by "A+B" when A and B
are individually available).  I think you've fixed the second case and
broken the first one.  Now it's possible that the case never arises,
and certainly failing to generate an early sort isn't catastrophic anyway.
But we ought to get it right.

* If the goal is to match what prepare_sort_from_pathkeys can do, I
think that doubling down on the strategy of having a duplicate copy
is not the path to a maintainable fix.

I think it's time for some refactoring of this code so that we can
actually share the logic.  Accordingly, I propose the attached.
It's really not that hard to share, as long as you accept the idea
that the list passed to the shared subroutine can be either a list of
TargetEntries or of bare expressions.

Also, I don't much care for either the name or API of
find_em_expr_usable_for_sorting_rel.  The sole current caller only
really needs a boolean result, and if it did need more than that
it'd likely need the whole EquivalenceMember not just the em_expr
(certainly createplan.c does).  So 0002 attached is some bikeshedding
on that.  I kept that separate because it might be wise to do it only
in HEAD, just in case somebody out there is calling the function from
an extension.

(BTW, responding to an upthread question: I think the looping to
remove multiple levels of RelabelType is probably now redundant,
but I didn't remove it.  If we want to do that there are more
places to touch than just this code.)

            regards, tom lane

diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index 0188c1e9a1..5f7e505950 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -35,6 +35,7 @@
 static EquivalenceMember *add_eq_member(EquivalenceClass *ec,
                                         Expr *expr, Relids relids, Relids nullable_relids,
                                         bool is_child, Oid datatype);
+static bool exprlist_member_ignore_relabel(Expr *node, List *exprs);
 static void generate_base_implied_equalities_const(PlannerInfo *root,
                                                    EquivalenceClass *ec);
 static void generate_base_implied_equalities_no_const(PlannerInfo *root,
@@ -769,6 +770,149 @@ get_eclass_for_sort_expr(PlannerInfo *root,
     return newec;
 }

+/*
+ * find_ec_member_matching_expr
+ *        Locate an EquivalenceClass member matching the given expr, if any;
+ *        return NULL if no match.
+ *
+ * "Matching" is defined as "equal after stripping RelabelTypes".
+ * This is used for identifying sort expressions, and we need to allow
+ * binary-compatible relabeling for some cases involving binary-compatible
+ * sort operators.
+ *
+ * Child EC members are ignored unless they belong to given 'relids'.
+ */
+EquivalenceMember *
+find_ec_member_matching_expr(EquivalenceClass *ec,
+                             Expr *expr,
+                             Relids relids)
+{
+    ListCell   *lc;
+
+    /* We ignore binary-compatible relabeling on both ends */
+    while (expr && IsA(expr, RelabelType))
+        expr = ((RelabelType *) expr)->arg;
+
+    foreach(lc, ec->ec_members)
+    {
+        EquivalenceMember *em = (EquivalenceMember *) lfirst(lc);
+        Expr       *emexpr;
+
+        /*
+         * We shouldn't be trying to sort by an equivalence class that
+         * contains a constant, so no need to consider such cases any further.
+         */
+        if (em->em_is_const)
+            continue;
+
+        /*
+         * Ignore child members unless they belong to the requested rel.
+         */
+        if (em->em_is_child &&
+            !bms_is_subset(em->em_relids, relids))
+            continue;
+
+        /* Match if same expression (after stripping relabel) */
+        emexpr = em->em_expr;
+        while (emexpr && IsA(emexpr, RelabelType))
+            emexpr = ((RelabelType *) emexpr)->arg;
+
+        if (equal(emexpr, expr))
+            return em;
+    }
+
+    return NULL;
+}
+
+/*
+ * find_computable_ec_member
+ *        Locate an EquivalenceClass member that can be computed from the
+ *        expressions appearing in "exprs"; return NULL if no match.
+ *
+ * "exprs" can be either a list of bare expression trees, or a list of
+ * TargetEntry nodes.  Either way, it should contain Vars and possibly
+ * Aggrefs and WindowFuncs, which are matched to the corresponding elements
+ * of the EquivalenceClass's expressions.  As in find_ec_member_matching_expr,
+ * we ignore binary-compatible relabeling.
+ *
+ * Child EC members are ignored unless they belong to given 'relids'.
+ */
+EquivalenceMember *
+find_computable_ec_member(EquivalenceClass *ec,
+                          List *exprs,
+                          Relids relids)
+{
+    ListCell   *lc;
+
+    foreach(lc, ec->ec_members)
+    {
+        EquivalenceMember *em = (EquivalenceMember *) lfirst(lc);
+        List       *exprvars;
+        ListCell   *lc2;
+
+        /*
+         * We shouldn't be trying to sort by an equivalence class that
+         * contains a constant, so no need to consider such cases any further.
+         */
+        if (em->em_is_const)
+            continue;
+
+        /*
+         * Ignore child members unless they belong to the requested rel.
+         */
+        if (em->em_is_child &&
+            !bms_is_subset(em->em_relids, relids))
+            continue;
+
+        /* Match if all Vars and quasi-Vars are available in exprs */
+        exprvars = pull_var_clause((Node *) em->em_expr,
+                                   PVC_INCLUDE_AGGREGATES |
+                                   PVC_INCLUDE_WINDOWFUNCS |
+                                   PVC_INCLUDE_PLACEHOLDERS);
+        foreach(lc2, exprvars)
+        {
+            if (!exprlist_member_ignore_relabel(lfirst(lc2), exprs))
+                break;
+        }
+        list_free(exprvars);
+        if (!lc2)
+            return em;            /* found usable expression */
+    }
+
+    return NULL;
+}
+
+/*
+ * exprlist_member_ignore_relabel
+ *      Subroutine for find_computable_ec_member: is "node" in "exprs"?
+ *
+ * Per the requirements of that function, "exprs" might or might not have
+ * TargetEntry superstructure, and we ignore RelabelType.
+ */
+static bool
+exprlist_member_ignore_relabel(Expr *node, List *exprs)
+{
+    ListCell   *lc;
+
+    while (node && IsA(node, RelabelType))
+        node = ((RelabelType *) node)->arg;
+
+    foreach(lc, exprs)
+    {
+        Expr       *expr = (Expr *) lfirst(lc);
+
+        if (expr && IsA(expr, TargetEntry))
+            expr = ((TargetEntry *) expr)->expr;
+
+        while (expr && IsA(expr, RelabelType))
+            expr = ((RelabelType *) expr)->arg;
+
+        if (equal(node, expr))
+            return true;
+    }
+    return false;
+}
+
 /*
  * Find an equivalence class member expression, all of whose Vars, come from
  * the indicated relation.
@@ -799,71 +943,80 @@ find_em_expr_for_rel(EquivalenceClass *ec, RelOptInfo *rel)
 }

 /*
- * Find an equivalence class member expression that can be safely used to build
- * a sort node using the provided relation. The rules are a subset of those
- * applied in prepare_sort_from_pathkeys since that function deals with sorts
- * that must be delayed until the last stages of query execution, while here
- * we only care about proactive sorts.
+ * Find an equivalence class member expression that can be used to build
+ * a sort node using the provided relation; return NULL if no candidate.
+ *
+ * The selected expression must be one that prepare_sort_from_pathkeys knows
+ * how to deal with, and we also apply a few additional constraints based on
+ * the fact that the desired sort will be done within the scan/join part of
+ * the plan.
  */
 Expr *
 find_em_expr_usable_for_sorting_rel(PlannerInfo *root, EquivalenceClass *ec,
                                     RelOptInfo *rel, bool require_parallel_safe)
 {
-    ListCell   *lc_em;
+    PathTarget *target = rel->reltarget;
+    EquivalenceMember *em;
+    ListCell   *lc;

     /*
-     * If there is more than one equivalence member matching these
-     * requirements we'll be content to choose any one of them.
+     * Reject volatile ECs immediately; such sorts must always be postponed.
      */
-    foreach(lc_em, ec->ec_members)
+    if (ec->ec_has_volatile)
+        return NULL;
+
+    /*
+     * Try to find an EM directly matching some reltarget member.
+     */
+    foreach(lc, target->exprs)
     {
-        EquivalenceMember *em = lfirst(lc_em);
-        Expr       *em_expr = em->em_expr;
+        Expr       *targetexpr = (Expr *) lfirst(lc);

-        /*
-         * We shouldn't be trying to sort by an equivalence class that
-         * contains a constant, so no need to consider such cases any further.
-         */
-        if (em->em_is_const)
+        em = find_ec_member_matching_expr(ec, targetexpr, rel->relids);
+        if (!em)
             continue;

         /*
-         * Any Vars in the equivalence class member need to come from this
-         * relation. This is a superset of prepare_sort_from_pathkeys ignoring
-         * child members unless they belong to the rel being sorted.
+         * Reject expressions involving set-returning functions, as those
+         * can't be computed early either.
          */
-        if (!bms_is_subset(em->em_relids, rel->relids))
+        if (IS_SRF_CALL((Node *) em->em_expr))
             continue;

         /*
          * If requested, reject expressions that are not parallel-safe.
          */
-        if (require_parallel_safe && !is_parallel_safe(root, (Node *) em_expr))
-            continue;
-
-        /*
-         * Disallow SRFs so that all of them can be evaluated at the correct
-         * time as determined by make_sort_input_target.
-         */
-        if (IS_SRF_CALL((Node *) em_expr))
+        if (require_parallel_safe && !is_parallel_safe(root,
+                                                       (Node *) em->em_expr))
             continue;

-        /*
-         * As long as the expression isn't volatile then
-         * prepare_sort_from_pathkeys is able to generate a new target entry,
-         * so there's no need to verify that one already exists.
-         *
-         * While prepare_sort_from_pathkeys has to be concerned about matching
-         * up a volatile expression to the proper tlist entry, it doesn't seem
-         * valuable here to expend the work trying to find a match in the
-         * target's exprs since such a sort will have to be postponed anyway.
-         */
-        if (!ec->ec_has_volatile)
-            return em->em_expr;
+        return em->em_expr;
     }

-    /* We didn't find any suitable equivalence class expression */
-    return NULL;
+    /*
+     * Try to find a computable expression.
+     */
+    em = find_computable_ec_member(ec, target->exprs, rel->relids);
+    if (!em)
+        return NULL;
+
+    /*
+     * Reject expressions involving set-returning functions, as those can't be
+     * computed early either.  (In principle, we could look for another EC
+     * member that isn't a SRF, but it's quite unlikely there'd be one.)
+     */
+    if (IS_SRF_CALL((Node *) em->em_expr))
+        return NULL;
+
+    /*
+     * If requested, reject expressions that are not parallel-safe.  (Again,
+     * it doesn't seem worth the trouble to see if there is another option.)
+     */
+    if (require_parallel_safe && !is_parallel_safe(root,
+                                                   (Node *) em->em_expr))
+        return NULL;
+
+    return em->em_expr;
 }

 /*
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 22f10fa339..77c30e4c0e 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -269,9 +269,6 @@ static Plan *prepare_sort_from_pathkeys(Plan *lefttree, List *pathkeys,
                                         Oid **p_sortOperators,
                                         Oid **p_collations,
                                         bool **p_nullsFirst);
-static EquivalenceMember *find_ec_member_for_tle(EquivalenceClass *ec,
-                                                 TargetEntry *tle,
-                                                 Relids relids);
 static Sort *make_sort_from_pathkeys(Plan *lefttree, List *pathkeys,
                                      Relids relids);
 static IncrementalSort *make_incrementalsort_from_pathkeys(Plan *lefttree,
@@ -2110,7 +2107,7 @@ create_sort_plan(PlannerInfo *root, SortPath *best_path, int flags)
                                   flags | CP_SMALL_TLIST);

     /*
-     * make_sort_from_pathkeys() indirectly calls find_ec_member_for_tle(),
+     * make_sort_from_pathkeys indirectly calls find_ec_member_matching_expr,
      * which will ignore any child EC members that don't belong to the given
      * relids. Thus, if this sort path is based on a child relation, we must
      * pass its relids.
@@ -6017,9 +6014,6 @@ make_incrementalsort(Plan *lefttree, int numCols, int nPresortedCols,
  *
  * Returns the node which is to be the input to the Sort (either lefttree,
  * or a Result stacked atop lefttree).
- *
- * Note: Restrictions on what expressions are safely sortable may also need to
- * be added to find_em_expr_usable_for_sorting_rel.
  */
 static Plan *
 prepare_sort_from_pathkeys(Plan *lefttree, List *pathkeys,
@@ -6089,7 +6083,7 @@ prepare_sort_from_pathkeys(Plan *lefttree, List *pathkeys,
             tle = get_tle_by_resno(tlist, reqColIdx[numsortkeys]);
             if (tle)
             {
-                em = find_ec_member_for_tle(ec, tle, relids);
+                em = find_ec_member_matching_expr(ec, tle->expr, relids);
                 if (em)
                 {
                     /* found expr at right place in tlist */
@@ -6120,7 +6114,7 @@ prepare_sort_from_pathkeys(Plan *lefttree, List *pathkeys,
             foreach(j, tlist)
             {
                 tle = (TargetEntry *) lfirst(j);
-                em = find_ec_member_for_tle(ec, tle, relids);
+                em = find_ec_member_matching_expr(ec, tle->expr, relids);
                 if (em)
                 {
                     /* found expr already in tlist */
@@ -6134,56 +6128,12 @@ prepare_sort_from_pathkeys(Plan *lefttree, List *pathkeys,
         if (!tle)
         {
             /*
-             * No matching tlist item; look for a computable expression. Note
-             * that we treat Aggrefs as if they were variables; this is
-             * necessary when attempting to sort the output from an Agg node
-             * for use in a WindowFunc (since grouping_planner will have
-             * treated the Aggrefs as variables, too).  Likewise, if we find a
-             * WindowFunc in a sort expression, treat it as a variable.
+             * No matching tlist item; look for a computable expression.
              */
-            Expr       *sortexpr = NULL;
-
-            foreach(j, ec->ec_members)
-            {
-                EquivalenceMember *em = (EquivalenceMember *) lfirst(j);
-                List       *exprvars;
-                ListCell   *k;
-
-                /*
-                 * We shouldn't be trying to sort by an equivalence class that
-                 * contains a constant, so no need to consider such cases any
-                 * further.
-                 */
-                if (em->em_is_const)
-                    continue;
-
-                /*
-                 * Ignore child members unless they belong to the rel being
-                 * sorted.
-                 */
-                if (em->em_is_child &&
-                    !bms_is_subset(em->em_relids, relids))
-                    continue;
-
-                sortexpr = em->em_expr;
-                exprvars = pull_var_clause((Node *) sortexpr,
-                                           PVC_INCLUDE_AGGREGATES |
-                                           PVC_INCLUDE_WINDOWFUNCS |
-                                           PVC_INCLUDE_PLACEHOLDERS);
-                foreach(k, exprvars)
-                {
-                    if (!tlist_member_ignore_relabel(lfirst(k), tlist))
-                        break;
-                }
-                list_free(exprvars);
-                if (!k)
-                {
-                    pk_datatype = em->em_datatype;
-                    break;        /* found usable expression */
-                }
-            }
-            if (!j)
+            em = find_computable_ec_member(ec, tlist, relids);
+            if (!em)
                 elog(ERROR, "could not find pathkey item to sort");
+            pk_datatype = em->em_datatype;

             /*
              * Do we need to insert a Result node?
@@ -6203,7 +6153,7 @@ prepare_sort_from_pathkeys(Plan *lefttree, List *pathkeys,
             /*
              * Add resjunk entry to input's tlist
              */
-            tle = makeTargetEntry(sortexpr,
+            tle = makeTargetEntry(copyObject(em->em_expr),
                                   list_length(tlist) + 1,
                                   NULL,
                                   true);
@@ -6242,56 +6192,6 @@ prepare_sort_from_pathkeys(Plan *lefttree, List *pathkeys,
     return lefttree;
 }

-/*
- * find_ec_member_for_tle
- *        Locate an EquivalenceClass member matching the given TLE, if any
- *
- * Child EC members are ignored unless they belong to given 'relids'.
- */
-static EquivalenceMember *
-find_ec_member_for_tle(EquivalenceClass *ec,
-                       TargetEntry *tle,
-                       Relids relids)
-{
-    Expr       *tlexpr;
-    ListCell   *lc;
-
-    /* We ignore binary-compatible relabeling on both ends */
-    tlexpr = tle->expr;
-    while (tlexpr && IsA(tlexpr, RelabelType))
-        tlexpr = ((RelabelType *) tlexpr)->arg;
-
-    foreach(lc, ec->ec_members)
-    {
-        EquivalenceMember *em = (EquivalenceMember *) lfirst(lc);
-        Expr       *emexpr;
-
-        /*
-         * We shouldn't be trying to sort by an equivalence class that
-         * contains a constant, so no need to consider such cases any further.
-         */
-        if (em->em_is_const)
-            continue;
-
-        /*
-         * Ignore child members unless they belong to the rel being sorted.
-         */
-        if (em->em_is_child &&
-            !bms_is_subset(em->em_relids, relids))
-            continue;
-
-        /* Match if same expression (after stripping relabel) */
-        emexpr = em->em_expr;
-        while (emexpr && IsA(emexpr, RelabelType))
-            emexpr = ((RelabelType *) emexpr)->arg;
-
-        if (equal(emexpr, tlexpr))
-            return em;
-    }
-
-    return NULL;
-}
-
 /*
  * make_sort_from_pathkeys
  *      Create sort plan to sort according to given pathkeys
@@ -6753,7 +6653,7 @@ make_unique_from_pathkeys(Plan *lefttree, List *pathkeys, int numCols)
             foreach(j, plan->targetlist)
             {
                 tle = (TargetEntry *) lfirst(j);
-                em = find_ec_member_for_tle(ec, tle, NULL);
+                em = find_ec_member_matching_expr(ec, tle->expr, NULL);
                 if (em)
                 {
                     /* found expr already in tlist */
diff --git a/src/backend/optimizer/util/tlist.c b/src/backend/optimizer/util/tlist.c
index 8a26288070..311579d059 100644
--- a/src/backend/optimizer/util/tlist.c
+++ b/src/backend/optimizer/util/tlist.c
@@ -79,34 +79,6 @@ tlist_member(Expr *node, List *targetlist)
     return NULL;
 }

-/*
- * tlist_member_ignore_relabel
- *      Same as above, except that we ignore top-level RelabelType nodes
- *      while checking for a match.  This is needed for some scenarios
- *      involving binary-compatible sort operations.
- */
-TargetEntry *
-tlist_member_ignore_relabel(Expr *node, List *targetlist)
-{
-    ListCell   *temp;
-
-    while (node && IsA(node, RelabelType))
-        node = ((RelabelType *) node)->arg;
-
-    foreach(temp, targetlist)
-    {
-        TargetEntry *tlentry = (TargetEntry *) lfirst(temp);
-        Expr       *tlexpr = tlentry->expr;
-
-        while (tlexpr && IsA(tlexpr, RelabelType))
-            tlexpr = ((RelabelType *) tlexpr)->arg;
-
-        if (equal(node, tlexpr))
-            return tlentry;
-    }
-    return NULL;
-}
-
 /*
  * tlist_member_match_var
  *      Same as above, except that we match the provided Var on the basis
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 035d3e1206..1d2a608ad9 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -135,6 +135,12 @@ extern EquivalenceClass *get_eclass_for_sort_expr(PlannerInfo *root,
                                                   Index sortref,
                                                   Relids rel,
                                                   bool create_it);
+extern EquivalenceMember *find_ec_member_matching_expr(EquivalenceClass *ec,
+                                                       Expr *expr,
+                                                       Relids relids);
+extern EquivalenceMember *find_computable_ec_member(EquivalenceClass *ec,
+                                                    List *exprs,
+                                                    Relids relids);
 extern Expr *find_em_expr_for_rel(EquivalenceClass *ec, RelOptInfo *rel);
 extern Expr *find_em_expr_usable_for_sorting_rel(PlannerInfo *root,
                                                  EquivalenceClass *ec,
diff --git a/src/include/optimizer/tlist.h b/src/include/optimizer/tlist.h
index e081ef2d5e..d62a09665a 100644
--- a/src/include/optimizer/tlist.h
+++ b/src/include/optimizer/tlist.h
@@ -18,7 +18,6 @@


 extern TargetEntry *tlist_member(Expr *node, List *targetlist);
-extern TargetEntry *tlist_member_ignore_relabel(Expr *node, List *targetlist);

 extern List *add_to_flat_tlist(List *tlist, List *exprs);

diff --git a/src/test/regress/expected/incremental_sort.out b/src/test/regress/expected/incremental_sort.out
index a417b566d9..545e301e48 100644
--- a/src/test/regress/expected/incremental_sort.out
+++ b/src/test/regress/expected/incremental_sort.out
@@ -1579,6 +1579,32 @@ order by 1, 2;
    ->  Function Scan on generate_series
 (7 rows)

+-- Parallel sort with an aggregate that can be safely generated in parallel,
+-- but we can't sort by partial aggregate values.
+explain (costs off) select count(*)
+from tenk1 t1
+join tenk1 t2 on t1.unique1 = t2.unique2
+join tenk1 t3 on t2.unique1 = t3.unique1
+order by count(*);
+                                          QUERY PLAN
+-----------------------------------------------------------------------------------------------
+ Sort
+   Sort Key: (count(*))
+   ->  Finalize Aggregate
+         ->  Gather
+               Workers Planned: 2
+               ->  Partial Aggregate
+                     ->  Parallel Hash Join
+                           Hash Cond: (t2.unique1 = t3.unique1)
+                           ->  Parallel Hash Join
+                                 Hash Cond: (t1.unique1 = t2.unique2)
+                                 ->  Parallel Index Only Scan using tenk1_unique1 on tenk1 t1
+                                 ->  Parallel Hash
+                                       ->  Parallel Index Scan using tenk1_unique2 on tenk1 t2
+                           ->  Parallel Hash
+                                 ->  Parallel Index Only Scan using tenk1_unique1 on tenk1 t3
+(15 rows)
+
 -- Parallel sort but with expression (correlated subquery) that
 -- is prohibited in parallel plans.
 explain (costs off) select distinct
diff --git a/src/test/regress/sql/incremental_sort.sql b/src/test/regress/sql/incremental_sort.sql
index 81429164d4..d8768a6b54 100644
--- a/src/test/regress/sql/incremental_sort.sql
+++ b/src/test/regress/sql/incremental_sort.sql
@@ -257,6 +257,13 @@ from tenk1, lateral (select tenk1.unique1 from generate_series(1, 1000)) as sub;
 explain (costs off) select sub.unique1, md5(stringu1)
 from tenk1, lateral (select tenk1.unique1 from generate_series(1, 1000)) as sub
 order by 1, 2;
+-- Parallel sort with an aggregate that can be safely generated in parallel,
+-- but we can't sort by partial aggregate values.
+explain (costs off) select count(*)
+from tenk1 t1
+join tenk1 t2 on t1.unique1 = t2.unique2
+join tenk1 t3 on t2.unique1 = t3.unique1
+order by count(*);
 -- Parallel sort but with expression (correlated subquery) that
 -- is prohibited in parallel plans.
 explain (costs off) select distinct
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index edba5e49a8..286052e3d6 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -2697,20 +2697,19 @@ get_useful_pathkeys_for_relation(PlannerInfo *root, RelOptInfo *rel,
             EquivalenceClass *pathkey_ec = pathkey->pk_eclass;

             /*
-             * We can only build a sort for pathkeys which contain an EC
-             * member in the current relation's target, so ignore any suffix
-             * of the list as soon as we find a pathkey without an EC member
-             * in the relation.
+             * We can only build a sort for pathkeys that contain a safe EC
+             * member computable from the current relation's reltarget, so
+             * ignore the remainder of the list as soon as we find a pathkey
+             * without such a member.
              *
-             * By still returning the prefix of the pathkeys list that does
-             * meet criteria of EC membership in the current relation, we
-             * enable not just an incremental sort on the entirety of
-             * query_pathkeys but also incremental sort below a JOIN.
+             * It's still worthwhile to return any prefix of the pathkeys list
+             * that meets this requirement, as we may be able to do an
+             * incremental sort.
              *
              * If requested, ensure the expression is parallel safe too.
              */
-            if (!find_em_expr_usable_for_sorting_rel(root, pathkey_ec, rel,
-                                                     require_parallel_safe))
+            if (!relation_has_safe_ec_member(root, rel, pathkey_ec,
+                                             require_parallel_safe))
                 break;

             npathkeys++;
diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index 5f7e505950..9d2d1022d1 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -943,17 +943,20 @@ find_em_expr_for_rel(EquivalenceClass *ec, RelOptInfo *rel)
 }

 /*
- * Find an equivalence class member expression that can be used to build
- * a sort node using the provided relation; return NULL if no candidate.
+ * relation_has_safe_ec_member
+ *        Can this relation be sorted on a "safe" member of this EC?
  *
- * The selected expression must be one that prepare_sort_from_pathkeys knows
- * how to deal with, and we also apply a few additional constraints based on
- * the fact that the desired sort will be done within the scan/join part of
- * the plan.
+ * To succeed, we must find an EC member that prepare_sort_from_pathkeys knows
+ * how to sort on, given the rel's reltarget as input.  There are also a few
+ * additional constraints based on the fact that the desired sort will be done
+ * within the scan/join part of the plan.
+ *
+ * At some point we might want to return the identified EquivalenceMember,
+ * but for now, callers only want to know if there is one.
  */
-Expr *
-find_em_expr_usable_for_sorting_rel(PlannerInfo *root, EquivalenceClass *ec,
-                                    RelOptInfo *rel, bool require_parallel_safe)
+bool
+relation_has_safe_ec_member(PlannerInfo *root, RelOptInfo *rel,
+                            EquivalenceClass *ec, bool require_parallel_safe)
 {
     PathTarget *target = rel->reltarget;
     EquivalenceMember *em;
@@ -963,7 +966,7 @@ find_em_expr_usable_for_sorting_rel(PlannerInfo *root, EquivalenceClass *ec,
      * Reject volatile ECs immediately; such sorts must always be postponed.
      */
     if (ec->ec_has_volatile)
-        return NULL;
+        return false;

     /*
      * Try to find an EM directly matching some reltarget member.
@@ -990,7 +993,7 @@ find_em_expr_usable_for_sorting_rel(PlannerInfo *root, EquivalenceClass *ec,
                                                        (Node *) em->em_expr))
             continue;

-        return em->em_expr;
+        return true;
     }

     /*
@@ -998,7 +1001,7 @@ find_em_expr_usable_for_sorting_rel(PlannerInfo *root, EquivalenceClass *ec,
      */
     em = find_computable_ec_member(ec, target->exprs, rel->relids);
     if (!em)
-        return NULL;
+        return false;

     /*
      * Reject expressions involving set-returning functions, as those can't be
@@ -1006,7 +1009,7 @@ find_em_expr_usable_for_sorting_rel(PlannerInfo *root, EquivalenceClass *ec,
      * member that isn't a SRF, but it's quite unlikely there'd be one.)
      */
     if (IS_SRF_CALL((Node *) em->em_expr))
-        return NULL;
+        return false;

     /*
      * If requested, reject expressions that are not parallel-safe.  (Again,
@@ -1014,9 +1017,9 @@ find_em_expr_usable_for_sorting_rel(PlannerInfo *root, EquivalenceClass *ec,
      */
     if (require_parallel_safe && !is_parallel_safe(root,
                                                    (Node *) em->em_expr))
-        return NULL;
+        return false;

-    return em->em_expr;
+    return true;
 }

 /*
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 1d2a608ad9..f13b9a37e9 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -142,10 +142,9 @@ extern EquivalenceMember *find_computable_ec_member(EquivalenceClass *ec,
                                                     List *exprs,
                                                     Relids relids);
 extern Expr *find_em_expr_for_rel(EquivalenceClass *ec, RelOptInfo *rel);
-extern Expr *find_em_expr_usable_for_sorting_rel(PlannerInfo *root,
-                                                 EquivalenceClass *ec,
-                                                 RelOptInfo *rel,
-                                                 bool require_parallel_safe);
+extern bool relation_has_safe_ec_member(PlannerInfo *root, RelOptInfo *rel,
+                                        EquivalenceClass *ec,
+                                        bool require_parallel_safe);
 extern void generate_base_implied_equalities(PlannerInfo *root);
 extern List *generate_join_implied_equalities(PlannerInfo *root,
                                               Relids join_relids,
pgsql-hackers by date:
From: "David G. Johnston"
Date: 17 April 2021, 22:24:57
Subject: Re: feature request ctid cast / sql exception
From: Mark Dilger
Date: 17 April 2021, 22:43:21
Subject: Re: pg_amcheck option to install extension
Re: "could not find pathkey item to sort" for TPC-DS queries 94-96 - Mailing list pgsql-hackers

Previous

Next