Thread: Is select_outer_pathkeys_for_merge() too strict now we have Incremental Sorts?
Is select_outer_pathkeys_for_merge() too strict now we have Incremental Sorts?
From
David Rowley
Date:
Hackers, Currently, if we have a query such as: SELECT a,b, COUNT(*) FROM a INNER JOIN b on a.a = b.x GROUP BY a,b ORDER BY a DESC, b; With enable_hashagg = off, we get the following plan: QUERY PLAN --------------------------------------- GroupAggregate Group Key: a.a, a.b -> Sort Sort Key: a.a DESC, a.b -> Merge Join Merge Cond: (a.a = b.x) -> Sort Sort Key: a.a -> Seq Scan on a -> Sort Sort Key: b.x -> Seq Scan on b We can see that the merge join picked to sort the input on a.a rather than a.a DESC. This is due to the way select_outer_pathkeys_for_merge() only picks the query_pathkeys as a prefix of the join pathkeys if we can find all of the join pathkeys in the query_pathkeys. I think we can relax this now that we have incremental sort. I think a better way to limit this is to allow a prefix of the query_pathkeys providing that covers *all* of the join pathkeys. That way, for the above query, it leaves it open for the planner to do the Merge Join by sorting by a.a DESC then just do an Incremental Sort to get the GroupAggregate input sorted by a.b. I've attached a patch for this and it changes the plan for the above query to: QUERY PLAN ---------------------------------------- GroupAggregate Group Key: a.a, a.b -> Incremental Sort Sort Key: a.a DESC, a.b Presorted Key: a.a -> Merge Join Merge Cond: (a.a = b.x) -> Sort Sort Key: a.a DESC -> Seq Scan on a -> Sort Sort Key: b.x DESC -> Seq Scan on b The current behaviour is causing me a bit of trouble in plan regression for the ORDER BY / DISTINCT aggregate improvement patch that I have pending. Is there any reason that we shouldn't do this? David
Attachment
Re: Is select_outer_pathkeys_for_merge() too strict now we have Incremental Sorts?
From
David Rowley
Date:
On Wed, 20 Jul 2022 at 15:02, David Rowley <dgrowleyml@gmail.com> wrote: > I've attached a patch for this and it changes the plan for the above query to: Looks like I based that patch on the wrong branch. Here's another version of the patch that's based on master. David
Attachment
Re: Is select_outer_pathkeys_for_merge() too strict now we have Incremental Sorts?
From
Richard Guo
Date:
On Wed, Jul 20, 2022 at 11:03 AM David Rowley <dgrowleyml@gmail.com> wrote:
I think we can relax this now that we have incremental sort. I think
a better way to limit this is to allow a prefix of the query_pathkeys
providing that covers *all* of the join pathkeys. That way, for the
above query, it leaves it open for the planner to do the Merge Join by
sorting by a.a DESC then just do an Incremental Sort to get the
GroupAggregate input sorted by a.b.
So the idea is if the ECs used by the mergeclauses are prefix of the
query_pathkeys, we use this prefix as pathkeys for the mergejoin. Why
not relax this further that if the ECs in the mergeclauses and the
query_pathkeys have common prefix, we use that prefix as pathkeys? So
that we can have a plan like below:
# explain (costs off) select * from t1 join t2 on t1.c = t2.c and t1.a = t2.a order by t1.a DESC, t1.b;
QUERY PLAN
-------------------------------------------------------
Incremental Sort
Sort Key: t1.a DESC, t1.b
Presorted Key: t1.a
-> Merge Join
Merge Cond: ((t1.a = t2.a) AND (t1.c = t2.c))
-> Sort
Sort Key: t1.a DESC, t1.c
-> Seq Scan on t1
-> Sort
Sort Key: t2.a DESC, t2.c
-> Seq Scan on t2
(11 rows)
Thanks
Richard
query_pathkeys, we use this prefix as pathkeys for the mergejoin. Why
not relax this further that if the ECs in the mergeclauses and the
query_pathkeys have common prefix, we use that prefix as pathkeys? So
that we can have a plan like below:
# explain (costs off) select * from t1 join t2 on t1.c = t2.c and t1.a = t2.a order by t1.a DESC, t1.b;
QUERY PLAN
-------------------------------------------------------
Incremental Sort
Sort Key: t1.a DESC, t1.b
Presorted Key: t1.a
-> Merge Join
Merge Cond: ((t1.a = t2.a) AND (t1.c = t2.c))
-> Sort
Sort Key: t1.a DESC, t1.c
-> Seq Scan on t1
-> Sort
Sort Key: t2.a DESC, t2.c
-> Seq Scan on t2
(11 rows)
Thanks
Richard
Re: Is select_outer_pathkeys_for_merge() too strict now we have Incremental Sorts?
From
David Rowley
Date:
On Wed, 20 Jul 2022 at 21:19, Richard Guo <guofenglinux@gmail.com> wrote: > So the idea is if the ECs used by the mergeclauses are prefix of the > query_pathkeys, we use this prefix as pathkeys for the mergejoin. Why > not relax this further that if the ECs in the mergeclauses and the > query_pathkeys have common prefix, we use that prefix as pathkeys? So > that we can have a plan like below: I don't think that's a clear-cut win. There is scoring code in there to try to arrange the pathkey list in the order of most-useful-to-upper-level-joins firsts. If we were to do as you describe we could end up generating worse plans when there is some subsequent Merge Join above this one that has join conditions that the query_pathkeys are not compatible with. Maybe your idea could be made to work in cases where bms_equal(joinrel->relids, root->all_baserels). In that case, we should not be processing any further joins and don't need to consider that as a factor for the scoring. David
Re: Is select_outer_pathkeys_for_merge() too strict now we have Incremental Sorts?
From
Richard Guo
Date:
On Thu, Jul 21, 2022 at 3:47 AM David Rowley <dgrowleyml@gmail.com> wrote:
On Wed, 20 Jul 2022 at 21:19, Richard Guo <guofenglinux@gmail.com> wrote:
> So the idea is if the ECs used by the mergeclauses are prefix of the
> query_pathkeys, we use this prefix as pathkeys for the mergejoin. Why
> not relax this further that if the ECs in the mergeclauses and the
> query_pathkeys have common prefix, we use that prefix as pathkeys? So
> that we can have a plan like below:
I don't think that's a clear-cut win. There is scoring code in there
to try to arrange the pathkey list in the order of
most-useful-to-upper-level-joins firsts. If we were to do as you
describe we could end up generating worse plans when there is some
subsequent Merge Join above this one that has join conditions that the
query_pathkeys are not compatible with.
Yeah, you're right. Although we would try different permutation of the
pathkeys in sort_inner_and_outer() but that does not cover every
possible ordering due to cost consideration. So we still need to respect
the heuristics behind the pathkey order returned by this function, which
is the scoring logic trying to list most-useful-to-upper-level-joins
keys earlier.
pathkeys in sort_inner_and_outer() but that does not cover every
possible ordering due to cost consideration. So we still need to respect
the heuristics behind the pathkey order returned by this function, which
is the scoring logic trying to list most-useful-to-upper-level-joins
keys earlier.
Maybe your idea could be made to work in cases where
bms_equal(joinrel->relids, root->all_baserels). In that case, we
should not be processing any further joins and don't need to consider
that as a factor for the scoring.
That should work, as long as this case is common enough to worth we
writing the codes.
Thanks
Richard
writing the codes.
Thanks
Richard
Re: Is select_outer_pathkeys_for_merge() too strict now we have Incremental Sorts?
From
David Rowley
Date:
On Thu, 21 Jul 2022 at 14:23, Richard Guo <guofenglinux@gmail.com> wrote: > > On Thu, Jul 21, 2022 at 3:47 AM David Rowley <dgrowleyml@gmail.com> wrote: >> Maybe your idea could be made to work in cases where >> bms_equal(joinrel->relids, root->all_baserels). In that case, we >> should not be processing any further joins and don't need to consider >> that as a factor for the scoring. > > > That should work, as long as this case is common enough to worth we > writing the codes. Thanks for looking at this patch. I've now pushed it. David