Re: BUG #15795: ERROR: could not find pathkey item to sort - Mailing list pgsql-bugs

From Tom Lane
Subject Re: BUG #15795: ERROR: could not find pathkey item to sort
Date
Msg-id 26224.1557429860@sss.pgh.pa.us
Whole thread Raw
In response to Re: BUG #15795: ERROR: could not find pathkey item to sort  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-bugs
I wrote:
> Now, how is that happening given that convert_subquery_pathkeys
> is specifically checking that the column is emitted by the subquery?
> The reason is that *it's checking the wrong thing*.  It's looking
> at what the native output of the subquery is, not at what the
> SubqueryScan that we're going to stack atop it will produce.
> ...
> I haven't decided what to do about this.  The minimally invasive
> fix would be to teach convert_subquery_pathkeys that it can't emit
> anything not listed in rel->reltarget.  That seems like it might
> lose useful information, though perhaps the consequences are minimal
> given how hard it is to hit this bug.

I concluded that that should be a reasonable fix.  The outer query
doesn't really have any use for sort keys that are not relevant to
either mergejoins or sorting/grouping, and in either of those cases
the sort keys would have been seen to be needed above the scan level
so they'd be included in the reltarget list.

Hence, attached is a draft patch for that part against v10.  (I started
there since we don't have a test case to show this bug in HEAD.)
I realized that the existing tests for !resjunk are just a half-baked
version of "is the column available in the outer query", so I folded
those into the improved checks.

Notice that with this in place, we don't get the funny extra
sort key in the MergeAppend, since convert_subquery_pathkeys doesn't
try to create that pathkey.  That's good for plan quality I guess
but it means that we have no live test case for the bug in
create_merge_append_plan --- which is still a bug nonetheless.

Anyway, I plan to go ahead with applying the combination of these
fixes.  If we find better test cases we can add them later, but
right now I'm not that hopeful about that.

            regards, tom lane

diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c
index 66c13a7..68431f1 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -29,6 +29,7 @@


 static bool pathkey_is_redundant(PathKey *new_pathkey, List *pathkeys);
+static Var *find_var_for_subquery_tle(RelOptInfo *rel, TargetEntry *tle);
 static bool right_merge_direction(PlannerInfo *root, PathKey *pathkey);


@@ -599,9 +600,11 @@ build_expression_pathkey(PlannerInfo *root,
  * 'subquery_pathkeys': the subquery's output pathkeys, in its terms.
  * 'subquery_tlist': the subquery's output targetlist, in its terms.
  *
- * It is not necessary for caller to do truncate_useless_pathkeys(),
- * because we select keys in a way that takes usefulness of the keys into
- * account.
+ * We intentionally don't do truncate_useless_pathkeys() here, because there
+ * are situations where seeing the raw ordering of the subquery is helpful.
+ * For example, if it returns ORDER BY x DESC, that may prompt us to
+ * construct a mergejoin using DESC order rather than ASC order; but the
+ * right_merge_direction heuristic would have us throw the knowledge away.
  */
 List *
 convert_subquery_pathkeys(PlannerInfo *root, RelOptInfo *rel,
@@ -627,22 +630,22 @@ convert_subquery_pathkeys(PlannerInfo *root, RelOptInfo *rel,
              * that same targetlist entry.
              */
             TargetEntry *tle;
+            Var           *outer_var;

             if (sub_eclass->ec_sortref == 0)    /* can't happen */
                 elog(ERROR, "volatile EquivalenceClass has no sortref");
             tle = get_sortgroupref_tle(sub_eclass->ec_sortref, subquery_tlist);
             Assert(tle);
-            /* resjunk items aren't visible to outer query */
-            if (!tle->resjunk)
+            /* Is TLE actually available to the outer query? */
+            outer_var = find_var_for_subquery_tle(rel, tle);
+            if (outer_var)
             {
                 /* We can represent this sub_pathkey */
                 EquivalenceMember *sub_member;
-                Expr       *outer_expr;
                 EquivalenceClass *outer_ec;

                 Assert(list_length(sub_eclass->ec_members) == 1);
                 sub_member = (EquivalenceMember *) linitial(sub_eclass->ec_members);
-                outer_expr = (Expr *) makeVarFromTargetEntry(rel->relid, tle);

                 /*
                  * Note: it might look funny to be setting sortref = 0 for a
@@ -656,7 +659,7 @@ convert_subquery_pathkeys(PlannerInfo *root, RelOptInfo *rel,
                  */
                 outer_ec =
                     get_eclass_for_sort_expr(root,
-                                             outer_expr,
+                                             (Expr *) outer_var,
                                              NULL,
                                              sub_eclass->ec_opfamilies,
                                              sub_member->em_datatype,
@@ -713,14 +716,15 @@ convert_subquery_pathkeys(PlannerInfo *root, RelOptInfo *rel,
                 foreach(k, subquery_tlist)
                 {
                     TargetEntry *tle = (TargetEntry *) lfirst(k);
+                    Var           *outer_var;
                     Expr       *tle_expr;
-                    Expr       *outer_expr;
                     EquivalenceClass *outer_ec;
                     PathKey    *outer_pk;
                     int            score;

-                    /* resjunk items aren't visible to outer query */
-                    if (tle->resjunk)
+                    /* Is TLE actually available to the outer query? */
+                    outer_var = find_var_for_subquery_tle(rel, tle);
+                    if (!outer_var)
                         continue;

                     /*
@@ -735,16 +739,9 @@ convert_subquery_pathkeys(PlannerInfo *root, RelOptInfo *rel,
                     if (!equal(tle_expr, sub_expr))
                         continue;

-                    /*
-                     * Build a representation of this targetlist entry as an
-                     * outer Var.
-                     */
-                    outer_expr = (Expr *) makeVarFromTargetEntry(rel->relid,
-                                                                 tle);
-
-                    /* See if we have a matching EC for that */
+                    /* See if we have a matching EC for the TLE */
                     outer_ec = get_eclass_for_sort_expr(root,
-                                                        outer_expr,
+                                                        (Expr *) outer_var,
                                                         NULL,
                                                         sub_eclass->ec_opfamilies,
                                                         sub_expr_type,
@@ -802,6 +799,41 @@ convert_subquery_pathkeys(PlannerInfo *root, RelOptInfo *rel,
 }

 /*
+ * find_var_for_subquery_tle
+ *
+ * If the given subquery tlist entry is due to be emitted by the subquery's
+ * scan node, return a Var for it, else return NULL.
+ *
+ * We need this to ensure that we don't return pathkeys describing values
+ * that are unavailable above the level of the subquery scan.
+ */
+static Var *
+find_var_for_subquery_tle(RelOptInfo *rel, TargetEntry *tle)
+{
+    ListCell   *lc;
+
+    /* If the TLE is resjunk, it's certainly not visible to the outer query */
+    if (tle->resjunk)
+        return NULL;
+
+    /* Search the rel's targetlist to see what it will return */
+    foreach(lc, rel->reltarget->exprs)
+    {
+        Var           *var = (Var *) lfirst(lc);
+
+        /* Ignore placeholders */
+        if (!IsA(var, Var))
+            continue;
+        Assert(var->varno == rel->relid);
+
+        /* If we find a Var referencing this TLE, we're good */
+        if (var->varattno == tle->resno)
+            return copyObject(var); /* Make a copy for safety */
+    }
+    return NULL;
+}
+
+/*
  * build_join_pathkeys
  *      Build the path keys for a join relation constructed by mergejoin or
  *      nestloop join.  This is normally the same as the outer path's keys.
diff --git a/src/test/regress/expected/union.out b/src/test/regress/expected/union.out
index 92d427a..48c27f8 100644
--- a/src/test/regress/expected/union.out
+++ b/src/test/regress/expected/union.out
@@ -916,6 +916,79 @@ ORDER BY x;
  2 | 4
 (1 row)

+-- Test cases where the native ordering of a sub-select has more pathkeys
+-- than the outer query cares about
+explain (costs off)
+select distinct q1 from
+  (select distinct * from int8_tbl i81
+   union all
+   select distinct * from int8_tbl i82) ss
+where q2 = q2;
+                       QUERY PLAN
+--------------------------------------------------------
+ Unique
+   ->  Merge Append
+         Sort Key: "*SELECT* 1".q1
+         ->  Subquery Scan on "*SELECT* 1"
+               ->  Unique
+                     ->  Sort
+                           Sort Key: i81.q1, i81.q2
+                           ->  Seq Scan on int8_tbl i81
+                                 Filter: (q2 = q2)
+         ->  Subquery Scan on "*SELECT* 2"
+               ->  Unique
+                     ->  Sort
+                           Sort Key: i82.q1, i82.q2
+                           ->  Seq Scan on int8_tbl i82
+                                 Filter: (q2 = q2)
+(15 rows)
+
+select distinct q1 from
+  (select distinct * from int8_tbl i81
+   union all
+   select distinct * from int8_tbl i82) ss
+where q2 = q2;
+        q1
+------------------
+              123
+ 4567890123456789
+(2 rows)
+
+explain (costs off)
+select distinct q1 from
+  (select distinct * from int8_tbl i81
+   union all
+   select distinct * from int8_tbl i82) ss
+where -q1 = q2;
+                       QUERY PLAN
+--------------------------------------------------------
+ Unique
+   ->  Merge Append
+         Sort Key: "*SELECT* 1".q1
+         ->  Subquery Scan on "*SELECT* 1"
+               ->  Unique
+                     ->  Sort
+                           Sort Key: i81.q1, i81.q2
+                           ->  Seq Scan on int8_tbl i81
+                                 Filter: ((- q1) = q2)
+         ->  Subquery Scan on "*SELECT* 2"
+               ->  Unique
+                     ->  Sort
+                           Sort Key: i82.q1, i82.q2
+                           ->  Seq Scan on int8_tbl i82
+                                 Filter: ((- q1) = q2)
+(15 rows)
+
+select distinct q1 from
+  (select distinct * from int8_tbl i81
+   union all
+   select distinct * from int8_tbl i82) ss
+where -q1 = q2;
+        q1
+------------------
+ 4567890123456789
+(1 row)
+
 -- Test proper handling of parameterized appendrel paths when the
 -- potential join qual is expensive
 create function expensivefunc(int) returns int
diff --git a/src/test/regress/sql/union.sql b/src/test/regress/sql/union.sql
index eed7c8d..5f4881d 100644
--- a/src/test/regress/sql/union.sql
+++ b/src/test/regress/sql/union.sql
@@ -379,6 +379,34 @@ SELECT * FROM
 WHERE x > 3
 ORDER BY x;

+-- Test cases where the native ordering of a sub-select has more pathkeys
+-- than the outer query cares about
+explain (costs off)
+select distinct q1 from
+  (select distinct * from int8_tbl i81
+   union all
+   select distinct * from int8_tbl i82) ss
+where q2 = q2;
+
+select distinct q1 from
+  (select distinct * from int8_tbl i81
+   union all
+   select distinct * from int8_tbl i82) ss
+where q2 = q2;
+
+explain (costs off)
+select distinct q1 from
+  (select distinct * from int8_tbl i81
+   union all
+   select distinct * from int8_tbl i82) ss
+where -q1 = q2;
+
+select distinct q1 from
+  (select distinct * from int8_tbl i81
+   union all
+   select distinct * from int8_tbl i82) ss
+where -q1 = q2;
+
 -- Test proper handling of parameterized appendrel paths when the
 -- potential join qual is expensive
 create function expensivefunc(int) returns int

pgsql-bugs by date:

Previous
From: Alan Jackson
Date:
Subject: inconsistent results querying table partitioned by date
Next
From: Tom Lane
Date:
Subject: Re: inconsistent results querying table partitioned by date