Is select_outer_pathkeys_for_merge() too strict now we have Incremental Sorts? - Mailing list pgsql-hackers

From David Rowley
Subject Is select_outer_pathkeys_for_merge() too strict now we have Incremental Sorts?
Date
Msg-id CAApHDvrtZu0PHVfDPFM4Yx3jNR2Wuwosv+T2zqa7LrhhBr2rRg@mail.gmail.com
Whole thread Raw
Responses Re: Is select_outer_pathkeys_for_merge() too strict now we have Incremental Sorts?
Re: Is select_outer_pathkeys_for_merge() too strict now we have Incremental Sorts?
List pgsql-hackers
Hackers,

Currently, if we have a query such as:

SELECT a,b, COUNT(*)
FROM a
INNER JOIN b on a.a = b.x
GROUP BY a,b
ORDER BY a DESC, b;

With enable_hashagg = off, we get the following plan:

              QUERY PLAN
---------------------------------------
 GroupAggregate
   Group Key: a.a, a.b
   ->  Sort
         Sort Key: a.a DESC, a.b
         ->  Merge Join
               Merge Cond: (a.a = b.x)
               ->  Sort
                     Sort Key: a.a
                     ->  Seq Scan on a
               ->  Sort
                     Sort Key: b.x
                     ->  Seq Scan on b

We can see that the merge join picked to sort the input on a.a rather
than a.a DESC.  This is due to the way
select_outer_pathkeys_for_merge() only picks the query_pathkeys as a
prefix of the join pathkeys if we can find all of the join pathkeys in
the query_pathkeys.

I think we can relax this now that we have incremental sort.  I think
a better way to limit this is to allow a prefix of the query_pathkeys
providing that covers *all* of the join pathkeys.  That way, for the
above query, it leaves it open for the planner to do the Merge Join by
sorting by a.a DESC then just do an Incremental Sort to get the
GroupAggregate input sorted by a.b.

I've attached a patch for this and it changes the plan for the above query to:

               QUERY PLAN
----------------------------------------
 GroupAggregate
   Group Key: a.a, a.b
   ->  Incremental Sort
         Sort Key: a.a DESC, a.b
         Presorted Key: a.a
         ->  Merge Join
               Merge Cond: (a.a = b.x)
               ->  Sort
                     Sort Key: a.a DESC
                     ->  Seq Scan on a
               ->  Sort
                     Sort Key: b.x DESC
                     ->  Seq Scan on b

The current behaviour is causing me a bit of trouble in plan
regression for the ORDER BY / DISTINCT aggregate improvement patch
that I have pending.

Is there any reason that we shouldn't do this?

David

Attachment

pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns
Next
From: Amit Kapila
Date:
Subject: Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns