Re: Add proper planner support for ORDER BY / DISTINCT aggregates - Mailing list pgsql-hackers

From Ronan Dunklau
Subject Re: Add proper planner support for ORDER BY / DISTINCT aggregates
Date
Msg-id 5348877.kVHUhDTtjH@aivenronan
Whole thread Raw
In response to Add proper planner support for ORDER BY / DISTINCT aggregates  (David Rowley <dgrowleyml@gmail.com>)
Responses Re: Add proper planner support for ORDER BY / DISTINCT aggregates  (David Rowley <dgrowleyml@gmail.com>)
List pgsql-hackers
> 
> This allows us to give presorted input to both aggregates in the following
> case:
> 
> SELECT agg(a ORDER BY a),agg2(a ORDER BY a,b) ...
> 
> but just the first agg in this one:
> 
> SELECT agg(a ORDER BY a),agg2(a ORDER BY c) ...

I don't know if it's acceptable, but in the case where you add both an 
aggregate with an ORDER BY clause, and another aggregate without the clause, 
the output for the unordered one will change and use the same ordering, maybe 
suprising the unsuspecting user. Would that be acceptable ?

> When testing the performance of all this I found that when a suitable
> index exists to provide pre-sorted input for the aggregation that the
> performance does improve. Unfortunately, it looks like things get more
> complex when no index exists.  In this case, since we're setting
> pathkeys to tell the planner we need a plan that provides pre-sorted
> input to the aggregates, the planner will add a sort below the
> aggregate node.  I initially didn't see any problem with that as it
> just moves the sort to a Sort node rather than having it done
> implicitly inside nodeAgg.c.  The problem is, it just does not perform
> as well.  I guess this is because when the sort is done inside
> nodeAgg.c that the transition function is called in a tight loop while
> reading records back from the tuplestore.  In the patched version,
> there's an additional node transition in between nodeAgg and nodeSort
> and that causes slower performance.  For now, I'm not quite sure what
> to do about that.  We set the plan pathkeys well before we could
> possibly decide if asking for pre-sorted input for the aggregates
> would be a good idea or not.

I was curious about the performance implication of that additional transition, 
and could not reproduce a signifcant difference. I may be doing something 
wrong: how did you highlight it ?

Regards,

--
Ronan Dunklau





pgsql-hackers by date:

Previous
From: gkokolatos@pm.me
Date:
Subject: Re: Teach pg_receivewal to use lz4 compression
Next
From: Dean Rasheed
Date:
Subject: Re: rand48 replacement