Re: Add proper planner support for ORDER BY / DISTINCT aggregates - Mailing list pgsql-hackers

From Ronan Dunklau
Subject Re: Add proper planner support for ORDER BY / DISTINCT aggregates
Date
Msg-id 4285468.TQqHX9kgJM@aivenronan
Whole thread Raw
In response to Re: Add proper planner support for ORDER BY / DISTINCT aggregates  (David Rowley <dgrowleyml@gmail.com>)
Responses Re: Add proper planner support for ORDER BY / DISTINCT aggregates  (David Rowley <dgrowleyml@gmail.com>)
Re: Add proper planner support for ORDER BY / DISTINCT aggregates  (David Rowley <dgrowleyml@gmail.com>)
List pgsql-hackers
> Looks like I did a sloppy job of that.  I messed up the condition in
> standard_qp_callback() that sets the ORDER BY aggregate pathkeys so
> that it accidentally set them when there was an unsortable GROUP BY
> clause, as highlighted by the postgres_fdw tests failing.  I've now
> added a comment to explain why the condition is the way it is so that
> I don't forget again.
> 
> Here's a cleaned-up version that passes make check-world.
> 

I've noticed that when the ORDER BY is a grouping key (which to be honest 
doesn't seem to make much sense to me), the sort key is duplicated, as 
demonstrated by one of the modified tests (partition_aggregate.sql). 

This leads to additional sort nodes being added when there is no necessity to 
do so. In the case of sort and index pathes, the duplicate keys are not 
considered,  I think the same should apply here.

It means the logic for appending the order by pathkeys to the existing group 
by pathkeys would ideally need to remove the redundant group by keys from the 
order by keys, considering this example:

regression=# explain select sum(unique1 order by ten, two), sum(unique1 order 
by four), sum(unique1 order by two, four) from tenk2 group by ten;
                               QUERY PLAN                               
------------------------------------------------------------------------
 GroupAggregate  (cost=1109.39..1234.49 rows=10 width=28)
   Group Key: ten
   ->  Sort  (cost=1109.39..1134.39 rows=10000 width=16)
         Sort Key: ten, ten, two
         ->  Seq Scan on tenk2  (cost=0.00..445.00 rows=10000 width=16)


We would ideally like to sort on ten, two, four to satisfy the first and last 
aggref at once. Stripping the common prefix (ten) would eliminate this problem. 

Also, existing regression tests cover the first problem (order by a grouping 
key) but I feel like they should be extended with a case similar as the above 
to check which pathkeys are used in the "multiple ordered aggregates + group 
by" cases. 


-- 
Ronan Dunklau





pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: Replace remaining castNode(…, lfirst(…)) and friends calls with l*_node()
Next
From: Masahiko Sawada
Date:
Subject: Re: Skipping logical replication transactions on subscriber side