Thread: BUG #15795: ERROR: could not find pathkey item to sort
The following bug has been logged on the website: Bug reference: 15795 Logged by: Suresh Kumar R Email address: suresh.arsenal29@gmail.com PostgreSQL version: 10.3 Operating system: Ubuntu 16.04 Description: I have a table 'person' with properties _id, _actual_type_name and name when I tried the below query, I got an error saying 'could not find pathkey item to sort'. Here is query: SELECT DISTINCT A._id0 as _id0, A._actual_type_name0 as _actual_type_name0 FROM ( ( SELECT DISTINCT _id as _id0, _actual_type_name as _actual_type_name0, name as name0 FROM hello_world.person ) union all ( SELECT DISTINCT _id as _id0, _actual_type_name as _actual_type_name0, name as name0 FROM hello_world.person)) as A WHERE ( A.name0 = A.name0 );
On Tuesday, May 7, 2019, PG Bug reporting form <noreply@postgresql.org> wrote:
The following bug has been logged on the website:
Bug reference: 15795
Logged by: Suresh Kumar R
Email address: suresh.arsenal29@gmail.com
PostgreSQL version: 10.3
Operating system: Ubuntu 16.04
Description:
I have a table 'person' with properties _id, _actual_type_name and name when
I tried the below query, I got an error saying 'could not find pathkey item
to sort'.
Upgrade to the latest release (10.7)
David J.
I tried in the latest 10.7 release too. I still get the same error.
Suresh.
On Wed, May 8, 2019 at 12:40 PM David G. Johnston <david.g.johnston@gmail.com> wrote:
On Tuesday, May 7, 2019, PG Bug reporting form <noreply@postgresql.org> wrote:The following bug has been logged on the website:
Bug reference: 15795
Logged by: Suresh Kumar R
Email address: suresh.arsenal29@gmail.com
PostgreSQL version: 10.3
Operating system: Ubuntu 16.04
Description:
I have a table 'person' with properties _id, _actual_type_name and name when
I tried the below query, I got an error saying 'could not find pathkey item
to sort'.Upgrade to the latest release (10.7)David J.
PG Bug reporting form <noreply@postgresql.org> writes: > Here is query: > SELECT DISTINCT A._id0 as _id0, A._actual_type_name0 as _actual_type_name0 > FROM ( ( SELECT DISTINCT _id as _id0, _actual_type_name as > _actual_type_name0, name as name0 FROM hello_world.person ) union all ( > SELECT DISTINCT _id as _id0, _actual_type_name as _actual_type_name0, name > as name0 FROM hello_world.person)) as A WHERE ( A.name0 = A.name0 ); It's politer to provide a self-contained test case, rather than expect us to reverse-engineer details that might be critical. For the archives, though, this isn't hard to reproduce: regression=# create table person(_id int, _actual_type_name text, name text); CREATE TABLE regression=# SELECT DISTINCT A._id0 as _id0, A._actual_type_name0 as _actual_type_name0 FROM ( ( SELECT DISTINCT _id as _id0, _actual_type_name as _actual_type_name0, name as name0 FROM person ) union all ( SELECT DISTINCT _id as _id0, _actual_type_name as _actual_type_name0, name as name0 FROM person)) as A WHERE ( A.name0 = A.name0 ); ERROR: could not find pathkey item to sort Curiously, this only fails for me in 9.6 and 10, not earlier or later branches. regards, tom lane
On 2019/05/08 19:33, Tom Lane wrote: > For the archives, though, this isn't hard to reproduce: > > regression=# create table person(_id int, _actual_type_name text, name text); > CREATE TABLE > regression=# SELECT DISTINCT A._id0 as _id0, A._actual_type_name0 as _actual_type_name0 > FROM ( ( SELECT DISTINCT _id as _id0, _actual_type_name as > _actual_type_name0, name as name0 FROM person ) union all ( > SELECT DISTINCT _id as _id0, _actual_type_name as _actual_type_name0, name > as name0 FROM person)) as A WHERE ( A.name0 = A.name0 ); > ERROR: could not find pathkey item to sort > > Curiously, this only fails for me in 9.6 and 10, not earlier or later > branches. Here are my findings after debugging this on 9.5, 9.6, 10, and 11: As you also said, the problem only seems to occur in 9.6 and 10. It does because the MergeAppend path created for UNION ALL subquery in the outer query's FROM contains one pathkey item too many in its pathkeys field. That's because each of its child subpaths has pathkeys (_id, _actual_type_name, name), whereas the outer query only wants (_id, _actual_type_name). I hoped that convert_subquery_pathkeys() called during child subquery's path creation would've taken the unnecessary "name" out because the outer query doesn't need it, but it didn't. It fails to do so because the subquery's pathkey "name" is being matched successfully to an outer EC installed due to the outer query's WHERE (A.name0 = A.name0); that's via the following code in convert_subquery_pathkeys(): outer_ec = get_eclass_for_sort_expr(root, outer_expr, I checked if and how convert_subquery_pathkeys() determines which of the subquery's pathkeys are useful to the outer query and there does exist a scoring system to evaluate the usefulness of subquery's pathkeys to outer query, but I didn't fully understand it. In any case, the main problem seems to be that convert_subquery_pathkeys() can't keep "name" from appearing in the output pathkeys that it produces. Based on that premise, I added the following code to convert_subquery_pathkeys(): + + if (retvallen == outer_query_keys) + break; which seems to fix the issue. Alternatively, maybe we can apply truncate_useless_pathkeys() to the result of convert_subquery_pathkeys(). The problem doesn't manifest with 9.5 or 11 (even HEAD for that matter), because a sort-based path is not chosen in their case for unique-fying (UNION ALL uses Append, not MergeAppend on cost grounds), so there's no attempt to look for "pathkey item to sort" to begin with. In 11's (and HEAD's) case, even if MergeAppend had won in terms of costing, the problem wouldn't occur at least for this query, because "name" doesn't appear in the outer query's ECs due to the following commit: commit 8ec5429e2f422f4d570d4909507db0d4ca83bbac Author: Tom Lane <tgl@sss.pgh.pa.us> Date: Sun Oct 8 12:23:32 2017 -0400 Reduce "X = X" to "X IS NOT NULL", if it's easy to do so. However it's easy to tweak the query such that "name" *will* end up in outer query's ECs as follows: SELECT DISTINCT A._id0 as _id0, A._actual_type_name0 as _actual_type_name0 FROM ( ( SELECT DISTINCT _id as _id0, _actual_type_name as _actual_type_name0, name as name0 FROM person ) union all ( SELECT DISTINCT _id as _id0, _actual_type_name as _actual_type_name0, name as name0 FROM person)) as A INNER JOIN person A1 ON ( A.name0 = A1.name ); Now, A.name0 and A1.name in the join condition do form a an EC, which does trick convert_subquery_pathkeys() into accepting the child subqueries' "name" key. IOW, if the "fix" I mentioned above is correct, it will have to applied to all the branches, because this seems to be a fundamental problem with convert_subquery_pathkeys(). Thoughts? Thanks, Amit
Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> writes: > On 2019/05/08 19:33, Tom Lane wrote: >> For the archives, though, this isn't hard to reproduce: >> >> regression=# create table person(_id int, _actual_type_name text, name text); >> CREATE TABLE >> regression=# SELECT DISTINCT A._id0 as _id0, A._actual_type_name0 as _actual_type_name0 >> FROM ( ( SELECT DISTINCT _id as _id0, _actual_type_name as >> _actual_type_name0, name as name0 FROM person ) union all ( >> SELECT DISTINCT _id as _id0, _actual_type_name as _actual_type_name0, name >> as name0 FROM person)) as A WHERE ( A.name0 = A.name0 ); >> ERROR: could not find pathkey item to sort >> >> Curiously, this only fails for me in 9.6 and 10, not earlier or later >> branches. > ... In any case, the main problem > seems to be that convert_subquery_pathkeys() can't keep "name" from > appearing in the output pathkeys that it produces. Based on that premise, > I added the following code to convert_subquery_pathkeys(): > + > + if (retvallen == outer_query_keys) > + break; That's just a kluge. > which seems to fix the issue. Alternatively, maybe we can apply > truncate_useless_pathkeys() to the result of convert_subquery_pathkeys(). Yeah, I thought about that too. The comment on convert_subquery_pathkeys that says truncate_useless_pathkeys would do nothing is wrong. However, if you apply truncate_useless_pathkeys to the results, you'll find that some of the regression test query plans change, and not for the better. I'm thinking of changing that comment to read like * We intentionally don't do truncate_useless_pathkeys() here, because there * are situations where seeing the raw ordering of the subquery is helpful. * For example, if it returns ORDER BY x DESC, that may prompt us to * construct a mergejoin using DESC order rather than ASC order; but the * right_merge_direction heuristic would have us throw the knowledge away. Anyway, yes, the basic issue is that convert_subquery_pathkeys is returning pathkeys that are outside the norm for the outer query, and some places are falling over because of that. I've found at least two bugs that I believe are present in multiple branches, although I'm having difficulty in constructing branch-independent tests for them, because the bugs interact with each other and with other changes that we've made. First off, the reported problem can be reproduced with an existing regression-test table, eg regression=# select distinct q1 from (select distinct * from int8_tbl union all select distinct * from int8_tbl) ss where(q2 = q2); ERROR: could not find pathkey item to sort The reason that this doesn't fail in >= v11 is that we do not build an EquivalenceClass for q2 in the outer query, so that there's nothing for convert_subquery_pathkeys to match it to. And that happens because the q2 = q2 clause is converted to "q2 is not null". In <= v10, that clause is left alone and since it's a mergejoinable operator we end up assigning ECs to each side. (Conceivably, since it's not actually a join clause, we don't need to do that, but I'm hesitant to muck with that; it would only be a band-aid over the true problem anyhow.) Anyway, what's happening in v10 is 1. convert_subquery_pathkeys sees that the subpath is sorted by q1, q2, and it successfully generates a 2-element pathkeys list representing that in the outer query, which it can do because (a) q2 is in the subpath tlist and (b) there is an EC in the outer query for q2. The other UNION ALL arm has an identical path, too. 2. When we come to build a MergeAppend path for the UNION ALL, we label it with the common pathkeys list of the two inputs, i.e. q1, q2, although the outer query really only cares about sort-by-q1. 3. When we come to build a plan for the MergeAppend, we need to locate the sort columns in the outputs of its child SubqueryScan plans ... and q2 is not there. Ooops. Now, how is that happening given that convert_subquery_pathkeys is specifically checking that the column is emitted by the subquery? The reason is that *it's checking the wrong thing*. It's looking at what the native output of the subquery is, not at what the SubqueryScan that we're going to stack atop it will produce. And in this case, because q2 is not required by the outer query (once we've pushed the WHERE clause down to or below the subquery scan), q2 is not in the reltarget list for the subquery RTE, so it's not going to be emitted by the SubqueryScan. I haven't decided what to do about this. The minimally invasive fix would be to teach convert_subquery_pathkeys that it can't emit anything not listed in rel->reltarget. That seems like it might lose useful information, though perhaps the consequences are minimal given how hard it is to hit this bug. Better would be to add entries to the reltarget for useful sort columns --- but I think it's likely too late to do so here. (We may have already generated some paths using the existing reltarget contents.) Anyway, we're not out of the woods by any means. Wondering whether the issue couldn't be reproduced in >= v11 by doing something that wouldn't be reduced to an IS NOT NULL, I stumbled across this in HEAD: regression=# select distinct q1 from (select distinct * from int8_tbl union all select distinct * from int8_tbl) ss where(-q1 = q2); psql: server closed the connection unexpectedly That's hitting this assertion: TRAP: FailedAssertion("!(list_length(dest_tlist) == list_length(src_tlist))", File: "tlist.c", Line: 345) Investigation shows that in this case, we successfully generate a Plan, but the final apply_tlist_labeling step fails, because create_merge_append_plan has blithely ignored the CP_EXACT_TLIST flag and stuck on extra resjunk columns. That seems like a pretty clear violation of createplan.c's internal expectations, so I made a patch that teaches that function to stick on a filtering Result if it added extra columns in violation of a flag demanding that it not do so. As of HEAD, create_append_plan has similar logic so I made the same change there, although I have not found a way to reach that bug today. (See patch attached.) With that patch, we successfully build a plan, but it looks a bit odd: Unique -> Result -> Merge Append Sort Key: "*SELECT* 1".q1, ((- "*SELECT* 1".q1)) -> Subquery Scan on "*SELECT* 1" -> Unique -> Sort Sort Key: i81.q1, i81.q2 -> Seq Scan on int8_tbl i81 Filter: ((- q1) = q2) -> Subquery Scan on "*SELECT* 2" -> Unique -> Sort Sort Key: i82.q1, i82.q2 -> Seq Scan on int8_tbl i82 Filter: ((- q1) = q2) What's happening there is that the outer query has an EC containing { -q1, q2 }, and convert_subquery_pathkeys successfully matches the q2 pathkey to that EC. Then, when prepare_sort_from_pathkeys tries to find a sort column for that EC, it still doesn't find q2 in the SubqueryScan output --- but it does find q1, so it can make an expression matching the other EC entry instead of failing. This is kind of grotty, of course. It'd be nicer if the outer MergeAppend only considered as many sort columns as we actually semantically need. But I'm not sure of a good way to arrange that. Another thing that I've not quite got to the bottom of is how come the MergeAppend path is getting stuck with CP_EXACT_TLIST responsibility in the first place. Generally, since we know MergeAppend can't project, there'd be a ProjectionPath on top of it. Usually there is, which is how come this bug has escaped detection this long --- so maybe there's another bug that we really ought to be using a ProjectionPath but are not, in some particular circumstance? Not that I think it would be okay for create_merge_append_plan to just ignore the flag. Anyway, because there are so many different things that can mask these bugs, getting good test cases for them might be hopeless. In the attached I have select distinct q1 from (select distinct * from int8_tbl i81 union all select distinct * from int8_tbl i82) ss where q2 = q2; which exposes the reltarget-vs-subplan-tlist issue, but only in 9.6 and 10, though it surely exists in other branches including HEAD, and select distinct q1 from (select distinct * from int8_tbl i81 union all select distinct * from int8_tbl i82) ss where -q1 = q2; which exposes create_merge_append_plan's misfeasance, but only in 11 and HEAD, though it surely exists at least as far back as v10. Getting more robust test cases would be nice -- any ideas? regards, tom lane diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c index 68b2204..7ed8279 100644 *** a/src/backend/optimizer/path/pathkeys.c --- b/src/backend/optimizer/path/pathkeys.c *************** build_expression_pathkey(PlannerInfo *ro *** 771,779 **** * 'subquery_pathkeys': the subquery's output pathkeys, in its terms. * 'subquery_tlist': the subquery's output targetlist, in its terms. * ! * It is not necessary for caller to do truncate_useless_pathkeys(), ! * because we select keys in a way that takes usefulness of the keys into ! * account. */ List * convert_subquery_pathkeys(PlannerInfo *root, RelOptInfo *rel, --- 771,781 ---- * 'subquery_pathkeys': the subquery's output pathkeys, in its terms. * 'subquery_tlist': the subquery's output targetlist, in its terms. * ! * We intentionally don't do truncate_useless_pathkeys() here, because there ! * are situations where seeing the raw ordering of the subquery is helpful. ! * For example, if it returns ORDER BY x DESC, that may prompt us to ! * construct a mergejoin using DESC order rather than ASC order; but the ! * right_merge_direction heuristic would have us throw the knowledge away. */ List * convert_subquery_pathkeys(PlannerInfo *root, RelOptInfo *rel, diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c index efe073a..270c119 100644 *** a/src/backend/optimizer/plan/createplan.c --- b/src/backend/optimizer/plan/createplan.c *************** static List *get_gating_quals(PlannerInf *** 81,88 **** static Plan *create_gating_plan(PlannerInfo *root, Path *path, Plan *plan, List *gating_quals); static Plan *create_join_plan(PlannerInfo *root, JoinPath *best_path); ! static Plan *create_append_plan(PlannerInfo *root, AppendPath *best_path); ! static Plan *create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path); static Result *create_group_result_plan(PlannerInfo *root, GroupResultPath *best_path); static ProjectSet *create_project_set_plan(PlannerInfo *root, ProjectSetPath *best_path); --- 81,90 ---- static Plan *create_gating_plan(PlannerInfo *root, Path *path, Plan *plan, List *gating_quals); static Plan *create_join_plan(PlannerInfo *root, JoinPath *best_path); ! static Plan *create_append_plan(PlannerInfo *root, AppendPath *best_path, ! int flags); ! static Plan *create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path, ! int flags); static Result *create_group_result_plan(PlannerInfo *root, GroupResultPath *best_path); static ProjectSet *create_project_set_plan(PlannerInfo *root, ProjectSetPath *best_path); *************** create_plan_recurse(PlannerInfo *root, P *** 390,400 **** break; case T_Append: plan = create_append_plan(root, ! (AppendPath *) best_path); break; case T_MergeAppend: plan = create_merge_append_plan(root, ! (MergeAppendPath *) best_path); break; case T_Result: if (IsA(best_path, ProjectionPath)) --- 392,404 ---- break; case T_Append: plan = create_append_plan(root, ! (AppendPath *) best_path, ! flags); break; case T_MergeAppend: plan = create_merge_append_plan(root, ! (MergeAppendPath *) best_path, ! flags); break; case T_Result: if (IsA(best_path, ProjectionPath)) *************** create_join_plan(PlannerInfo *root, Join *** 1054,1063 **** * Returns a Plan node. */ static Plan * ! create_append_plan(PlannerInfo *root, AppendPath *best_path) { Append *plan; List *tlist = build_path_tlist(root, &best_path->path); List *pathkeys = best_path->path.pathkeys; List *subplans = NIL; ListCell *subpaths; --- 1058,1069 ---- * Returns a Plan node. */ static Plan * ! create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags) { Append *plan; List *tlist = build_path_tlist(root, &best_path->path); + int orig_tlist_length = list_length(tlist); + bool tlist_was_changed = false; List *pathkeys = best_path->path.pathkeys; List *subplans = NIL; ListCell *subpaths; *************** create_append_plan(PlannerInfo *root, Ap *** 1112,1118 **** if (pathkeys != NIL) { ! /* Compute sort column info, and adjust the Append's tlist as needed */ (void) prepare_sort_from_pathkeys((Plan *) plan, pathkeys, best_path->path.parent->relids, NULL, --- 1118,1129 ---- if (pathkeys != NIL) { ! /* ! * Compute sort column info, and adjust the Append's tlist as needed. ! * Because we pass adjust_tlist_in_place = true, we may ignore the ! * function result; it must be the same plan node. However, we then ! * need to detect whether any tlist entries were added. ! */ (void) prepare_sort_from_pathkeys((Plan *) plan, pathkeys, best_path->path.parent->relids, NULL, *************** create_append_plan(PlannerInfo *root, Ap *** 1122,1127 **** --- 1133,1139 ---- &nodeSortOperators, &nodeCollations, &nodeNullsFirst); + tlist_was_changed = (orig_tlist_length != list_length(plan->plan.targetlist)); } /* Build the plan for each child */ *************** create_append_plan(PlannerInfo *root, Ap *** 1231,1237 **** copy_generic_path_info(&plan->plan, (Path *) best_path); ! return (Plan *) plan; } /* --- 1243,1262 ---- copy_generic_path_info(&plan->plan, (Path *) best_path); ! /* ! * If prepare_sort_from_pathkeys added sort columns, but we were told to ! * produce either the exact tlist or a narrow tlist, we should get rid of ! * the sort columns again. We must inject a projection node to do so. ! */ ! if (tlist_was_changed && (flags & (CP_EXACT_TLIST | CP_SMALL_TLIST))) ! { ! tlist = list_truncate(list_copy(plan->plan.targetlist), ! orig_tlist_length); ! return inject_projection_plan((Plan *) plan, tlist, ! plan->plan.parallel_safe); ! } ! else ! return (Plan *) plan; } /* *************** create_append_plan(PlannerInfo *root, Ap *** 1242,1252 **** * Returns a Plan node. */ static Plan * ! create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path) { MergeAppend *node = makeNode(MergeAppend); Plan *plan = &node->plan; List *tlist = build_path_tlist(root, &best_path->path); List *pathkeys = best_path->path.pathkeys; List *subplans = NIL; ListCell *subpaths; --- 1267,1280 ---- * Returns a Plan node. */ static Plan * ! create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path, ! int flags) { MergeAppend *node = makeNode(MergeAppend); Plan *plan = &node->plan; List *tlist = build_path_tlist(root, &best_path->path); + int orig_tlist_length = list_length(tlist); + bool tlist_was_changed; List *pathkeys = best_path->path.pathkeys; List *subplans = NIL; ListCell *subpaths; *************** create_merge_append_plan(PlannerInfo *ro *** 1265,1271 **** plan->lefttree = NULL; plan->righttree = NULL; ! /* Compute sort column info, and adjust MergeAppend's tlist as needed */ (void) prepare_sort_from_pathkeys(plan, pathkeys, best_path->path.parent->relids, NULL, --- 1293,1304 ---- plan->lefttree = NULL; plan->righttree = NULL; ! /* ! * Compute sort column info, and adjust MergeAppend's tlist as needed. ! * Because we pass adjust_tlist_in_place = true, we may ignore the ! * function result; it must be the same plan node. However, we then need ! * to detect whether any tlist entries were added. ! */ (void) prepare_sort_from_pathkeys(plan, pathkeys, best_path->path.parent->relids, NULL, *************** create_merge_append_plan(PlannerInfo *ro *** 1275,1280 **** --- 1308,1314 ---- &node->sortOperators, &node->collations, &node->nullsFirst); + tlist_was_changed = (orig_tlist_length != list_length(plan->targetlist)); /* * Now prepare the child plans. We must apply prepare_sort_from_pathkeys *************** create_merge_append_plan(PlannerInfo *ro *** 1371,1377 **** node->mergeplans = subplans; node->part_prune_info = partpruneinfo; ! return (Plan *) node; } /* --- 1405,1422 ---- node->mergeplans = subplans; node->part_prune_info = partpruneinfo; ! /* ! * If prepare_sort_from_pathkeys added sort columns, but we were told to ! * produce either the exact tlist or a narrow tlist, we should get rid of ! * the sort columns again. We must inject a projection node to do so. ! */ ! if (tlist_was_changed && (flags & (CP_EXACT_TLIST | CP_SMALL_TLIST))) ! { ! tlist = list_truncate(list_copy(plan->targetlist), orig_tlist_length); ! return inject_projection_plan(plan, tlist, plan->parallel_safe); ! } ! else ! return plan; } /* diff --git a/src/test/regress/expected/union.out b/src/test/regress/expected/union.out index da70438..c4405ca 100644 *** a/src/test/regress/expected/union.out --- b/src/test/regress/expected/union.out *************** ORDER BY x; *** 915,920 **** --- 915,994 ---- 2 | 4 (1 row) + -- Test cases where the native ordering of a sub-select has more pathkeys + -- than the outer query cares about + explain (costs off) + select distinct q1 from + (select distinct * from int8_tbl i81 + union all + select distinct * from int8_tbl i82) ss + where q2 = q2; + QUERY PLAN + ---------------------------------------------------------- + Unique + -> Merge Append + Sort Key: "*SELECT* 1".q1 + -> Subquery Scan on "*SELECT* 1" + -> Unique + -> Sort + Sort Key: i81.q1, i81.q2 + -> Seq Scan on int8_tbl i81 + Filter: (q2 IS NOT NULL) + -> Subquery Scan on "*SELECT* 2" + -> Unique + -> Sort + Sort Key: i82.q1, i82.q2 + -> Seq Scan on int8_tbl i82 + Filter: (q2 IS NOT NULL) + (15 rows) + + select distinct q1 from + (select distinct * from int8_tbl i81 + union all + select distinct * from int8_tbl i82) ss + where q2 = q2; + q1 + ------------------ + 123 + 4567890123456789 + (2 rows) + + explain (costs off) + select distinct q1 from + (select distinct * from int8_tbl i81 + union all + select distinct * from int8_tbl i82) ss + where -q1 = q2; + QUERY PLAN + ---------------------------------------------------------------- + Unique + -> Result + -> Merge Append + Sort Key: "*SELECT* 1".q1, ((- "*SELECT* 1".q1)) + -> Subquery Scan on "*SELECT* 1" + -> Unique + -> Sort + Sort Key: i81.q1, i81.q2 + -> Seq Scan on int8_tbl i81 + Filter: ((- q1) = q2) + -> Subquery Scan on "*SELECT* 2" + -> Unique + -> Sort + Sort Key: i82.q1, i82.q2 + -> Seq Scan on int8_tbl i82 + Filter: ((- q1) = q2) + (16 rows) + + select distinct q1 from + (select distinct * from int8_tbl i81 + union all + select distinct * from int8_tbl i82) ss + where -q1 = q2; + q1 + ------------------ + 4567890123456789 + (1 row) + -- Test proper handling of parameterized appendrel paths when the -- potential join qual is expensive create function expensivefunc(int) returns int diff --git a/src/test/regress/sql/union.sql b/src/test/regress/sql/union.sql index eed7c8d..5f4881d 100644 *** a/src/test/regress/sql/union.sql --- b/src/test/regress/sql/union.sql *************** SELECT * FROM *** 379,384 **** --- 379,412 ---- WHERE x > 3 ORDER BY x; + -- Test cases where the native ordering of a sub-select has more pathkeys + -- than the outer query cares about + explain (costs off) + select distinct q1 from + (select distinct * from int8_tbl i81 + union all + select distinct * from int8_tbl i82) ss + where q2 = q2; + + select distinct q1 from + (select distinct * from int8_tbl i81 + union all + select distinct * from int8_tbl i82) ss + where q2 = q2; + + explain (costs off) + select distinct q1 from + (select distinct * from int8_tbl i81 + union all + select distinct * from int8_tbl i82) ss + where -q1 = q2; + + select distinct q1 from + (select distinct * from int8_tbl i81 + union all + select distinct * from int8_tbl i82) ss + where -q1 = q2; + -- Test proper handling of parameterized appendrel paths when the -- potential join qual is expensive create function expensivefunc(int) returns int
I wrote: > Now, how is that happening given that convert_subquery_pathkeys > is specifically checking that the column is emitted by the subquery? > The reason is that *it's checking the wrong thing*. It's looking > at what the native output of the subquery is, not at what the > SubqueryScan that we're going to stack atop it will produce. > ... > I haven't decided what to do about this. The minimally invasive > fix would be to teach convert_subquery_pathkeys that it can't emit > anything not listed in rel->reltarget. That seems like it might > lose useful information, though perhaps the consequences are minimal > given how hard it is to hit this bug. I concluded that that should be a reasonable fix. The outer query doesn't really have any use for sort keys that are not relevant to either mergejoins or sorting/grouping, and in either of those cases the sort keys would have been seen to be needed above the scan level so they'd be included in the reltarget list. Hence, attached is a draft patch for that part against v10. (I started there since we don't have a test case to show this bug in HEAD.) I realized that the existing tests for !resjunk are just a half-baked version of "is the column available in the outer query", so I folded those into the improved checks. Notice that with this in place, we don't get the funny extra sort key in the MergeAppend, since convert_subquery_pathkeys doesn't try to create that pathkey. That's good for plan quality I guess but it means that we have no live test case for the bug in create_merge_append_plan --- which is still a bug nonetheless. Anyway, I plan to go ahead with applying the combination of these fixes. If we find better test cases we can add them later, but right now I'm not that hopeful about that. regards, tom lane diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c index 66c13a7..68431f1 100644 --- a/src/backend/optimizer/path/pathkeys.c +++ b/src/backend/optimizer/path/pathkeys.c @@ -29,6 +29,7 @@ static bool pathkey_is_redundant(PathKey *new_pathkey, List *pathkeys); +static Var *find_var_for_subquery_tle(RelOptInfo *rel, TargetEntry *tle); static bool right_merge_direction(PlannerInfo *root, PathKey *pathkey); @@ -599,9 +600,11 @@ build_expression_pathkey(PlannerInfo *root, * 'subquery_pathkeys': the subquery's output pathkeys, in its terms. * 'subquery_tlist': the subquery's output targetlist, in its terms. * - * It is not necessary for caller to do truncate_useless_pathkeys(), - * because we select keys in a way that takes usefulness of the keys into - * account. + * We intentionally don't do truncate_useless_pathkeys() here, because there + * are situations where seeing the raw ordering of the subquery is helpful. + * For example, if it returns ORDER BY x DESC, that may prompt us to + * construct a mergejoin using DESC order rather than ASC order; but the + * right_merge_direction heuristic would have us throw the knowledge away. */ List * convert_subquery_pathkeys(PlannerInfo *root, RelOptInfo *rel, @@ -627,22 +630,22 @@ convert_subquery_pathkeys(PlannerInfo *root, RelOptInfo *rel, * that same targetlist entry. */ TargetEntry *tle; + Var *outer_var; if (sub_eclass->ec_sortref == 0) /* can't happen */ elog(ERROR, "volatile EquivalenceClass has no sortref"); tle = get_sortgroupref_tle(sub_eclass->ec_sortref, subquery_tlist); Assert(tle); - /* resjunk items aren't visible to outer query */ - if (!tle->resjunk) + /* Is TLE actually available to the outer query? */ + outer_var = find_var_for_subquery_tle(rel, tle); + if (outer_var) { /* We can represent this sub_pathkey */ EquivalenceMember *sub_member; - Expr *outer_expr; EquivalenceClass *outer_ec; Assert(list_length(sub_eclass->ec_members) == 1); sub_member = (EquivalenceMember *) linitial(sub_eclass->ec_members); - outer_expr = (Expr *) makeVarFromTargetEntry(rel->relid, tle); /* * Note: it might look funny to be setting sortref = 0 for a @@ -656,7 +659,7 @@ convert_subquery_pathkeys(PlannerInfo *root, RelOptInfo *rel, */ outer_ec = get_eclass_for_sort_expr(root, - outer_expr, + (Expr *) outer_var, NULL, sub_eclass->ec_opfamilies, sub_member->em_datatype, @@ -713,14 +716,15 @@ convert_subquery_pathkeys(PlannerInfo *root, RelOptInfo *rel, foreach(k, subquery_tlist) { TargetEntry *tle = (TargetEntry *) lfirst(k); + Var *outer_var; Expr *tle_expr; - Expr *outer_expr; EquivalenceClass *outer_ec; PathKey *outer_pk; int score; - /* resjunk items aren't visible to outer query */ - if (tle->resjunk) + /* Is TLE actually available to the outer query? */ + outer_var = find_var_for_subquery_tle(rel, tle); + if (!outer_var) continue; /* @@ -735,16 +739,9 @@ convert_subquery_pathkeys(PlannerInfo *root, RelOptInfo *rel, if (!equal(tle_expr, sub_expr)) continue; - /* - * Build a representation of this targetlist entry as an - * outer Var. - */ - outer_expr = (Expr *) makeVarFromTargetEntry(rel->relid, - tle); - - /* See if we have a matching EC for that */ + /* See if we have a matching EC for the TLE */ outer_ec = get_eclass_for_sort_expr(root, - outer_expr, + (Expr *) outer_var, NULL, sub_eclass->ec_opfamilies, sub_expr_type, @@ -802,6 +799,41 @@ convert_subquery_pathkeys(PlannerInfo *root, RelOptInfo *rel, } /* + * find_var_for_subquery_tle + * + * If the given subquery tlist entry is due to be emitted by the subquery's + * scan node, return a Var for it, else return NULL. + * + * We need this to ensure that we don't return pathkeys describing values + * that are unavailable above the level of the subquery scan. + */ +static Var * +find_var_for_subquery_tle(RelOptInfo *rel, TargetEntry *tle) +{ + ListCell *lc; + + /* If the TLE is resjunk, it's certainly not visible to the outer query */ + if (tle->resjunk) + return NULL; + + /* Search the rel's targetlist to see what it will return */ + foreach(lc, rel->reltarget->exprs) + { + Var *var = (Var *) lfirst(lc); + + /* Ignore placeholders */ + if (!IsA(var, Var)) + continue; + Assert(var->varno == rel->relid); + + /* If we find a Var referencing this TLE, we're good */ + if (var->varattno == tle->resno) + return copyObject(var); /* Make a copy for safety */ + } + return NULL; +} + +/* * build_join_pathkeys * Build the path keys for a join relation constructed by mergejoin or * nestloop join. This is normally the same as the outer path's keys. diff --git a/src/test/regress/expected/union.out b/src/test/regress/expected/union.out index 92d427a..48c27f8 100644 --- a/src/test/regress/expected/union.out +++ b/src/test/regress/expected/union.out @@ -916,6 +916,79 @@ ORDER BY x; 2 | 4 (1 row) +-- Test cases where the native ordering of a sub-select has more pathkeys +-- than the outer query cares about +explain (costs off) +select distinct q1 from + (select distinct * from int8_tbl i81 + union all + select distinct * from int8_tbl i82) ss +where q2 = q2; + QUERY PLAN +-------------------------------------------------------- + Unique + -> Merge Append + Sort Key: "*SELECT* 1".q1 + -> Subquery Scan on "*SELECT* 1" + -> Unique + -> Sort + Sort Key: i81.q1, i81.q2 + -> Seq Scan on int8_tbl i81 + Filter: (q2 = q2) + -> Subquery Scan on "*SELECT* 2" + -> Unique + -> Sort + Sort Key: i82.q1, i82.q2 + -> Seq Scan on int8_tbl i82 + Filter: (q2 = q2) +(15 rows) + +select distinct q1 from + (select distinct * from int8_tbl i81 + union all + select distinct * from int8_tbl i82) ss +where q2 = q2; + q1 +------------------ + 123 + 4567890123456789 +(2 rows) + +explain (costs off) +select distinct q1 from + (select distinct * from int8_tbl i81 + union all + select distinct * from int8_tbl i82) ss +where -q1 = q2; + QUERY PLAN +-------------------------------------------------------- + Unique + -> Merge Append + Sort Key: "*SELECT* 1".q1 + -> Subquery Scan on "*SELECT* 1" + -> Unique + -> Sort + Sort Key: i81.q1, i81.q2 + -> Seq Scan on int8_tbl i81 + Filter: ((- q1) = q2) + -> Subquery Scan on "*SELECT* 2" + -> Unique + -> Sort + Sort Key: i82.q1, i82.q2 + -> Seq Scan on int8_tbl i82 + Filter: ((- q1) = q2) +(15 rows) + +select distinct q1 from + (select distinct * from int8_tbl i81 + union all + select distinct * from int8_tbl i82) ss +where -q1 = q2; + q1 +------------------ + 4567890123456789 +(1 row) + -- Test proper handling of parameterized appendrel paths when the -- potential join qual is expensive create function expensivefunc(int) returns int diff --git a/src/test/regress/sql/union.sql b/src/test/regress/sql/union.sql index eed7c8d..5f4881d 100644 --- a/src/test/regress/sql/union.sql +++ b/src/test/regress/sql/union.sql @@ -379,6 +379,34 @@ SELECT * FROM WHERE x > 3 ORDER BY x; +-- Test cases where the native ordering of a sub-select has more pathkeys +-- than the outer query cares about +explain (costs off) +select distinct q1 from + (select distinct * from int8_tbl i81 + union all + select distinct * from int8_tbl i82) ss +where q2 = q2; + +select distinct q1 from + (select distinct * from int8_tbl i81 + union all + select distinct * from int8_tbl i82) ss +where q2 = q2; + +explain (costs off) +select distinct q1 from + (select distinct * from int8_tbl i81 + union all + select distinct * from int8_tbl i82) ss +where -q1 = q2; + +select distinct q1 from + (select distinct * from int8_tbl i81 + union all + select distinct * from int8_tbl i82) ss +where -q1 = q2; + -- Test proper handling of parameterized appendrel paths when the -- potential join qual is expensive create function expensivefunc(int) returns int