Add proper planner support for ORDER BY / DISTINCT aggregates - Mailing list pgsql-hackers
From | David Rowley |
---|---|
Subject | Add proper planner support for ORDER BY / DISTINCT aggregates |
Date | |
Msg-id | CAApHDvpHzfo92=R4W0+xVua3BUYCKMckWAmo-2t_KiXN-wYH=w@mail.gmail.com Whole thread Raw |
Responses |
Re: Add proper planner support for ORDER BY / DISTINCT aggregates
Re: Add proper planner support for ORDER BY / DISTINCT aggregates Re: Add proper planner support for ORDER BY / DISTINCT aggregates |
List | pgsql-hackers |
A few years ago I wrote a patch to implement the missing aggregate combine functions for array_agg and string_agg [1]. In the end, the patch was rejected due to some concern [2] that if we allow these aggregates to run in parallel then it might mess up the order in which values are being aggregated by some unsuspecting user who didn't include an ORDER BY clause in the aggregate. It was mentioned at the time that due to nodeAgg.c performing so terribly with ORDER BY aggregates that we couldn't possibly ask users who were upset by this change to include an ORDER BY in their aggregate functions. I'd still quite like to finish off the remaining aggregate functions that still don't have a combine function, so to get that going again I've written some code that gets the query planner onboard with picking a plan that allows nodeAgg.c to skip doing any internal sorts when the input results are already sorted according to what's required by the aggregate function. Of course, the query could have many aggregates all with differing ORDER BY clauses. Since we reuse the same input for each, it might not be possible to feed each aggregate with suitably sorted input. When the input is not sorted, nodeAgg.c still must perform the sort as it does today. So unfortunately we can't remove the nodeAgg.c code for that. The best I could come up with is just picking a sort order that suits the first ORDER BY aggregate, then spin through the remaining ones to see if there's any with a more strict order requirement that we can use to suit that one and the first one. The planner does something similar for window functions already, although it's not quite as comprehensive to look beyond the first window for windows with a more strict sort requirement. This allows us to give presorted input to both aggregates in the following case: SELECT agg(a ORDER BY a),agg2(a ORDER BY a,b) ... but just the first agg in this one: SELECT agg(a ORDER BY a),agg2(a ORDER BY c) ... In order to make DISTINCT work, I had to add a new expression evaluation operator to allow filtering if the current value is the same as the last value. The current unpatched code implements distinct when reading back the sorted tuplestore data. Since I don't have a tuplestore with the pre-sorted version, I needed to cache the last Datum, or last tuple somewhere. I ended up putting that in the AggStatePerTransData struct. I'm not quite sure if I like that, but I don't really see where else it could go. When testing the performance of all this I found that when a suitable index exists to provide pre-sorted input for the aggregation that the performance does improve. Unfortunately, it looks like things get more complex when no index exists. In this case, since we're setting pathkeys to tell the planner we need a plan that provides pre-sorted input to the aggregates, the planner will add a sort below the aggregate node. I initially didn't see any problem with that as it just moves the sort to a Sort node rather than having it done implicitly inside nodeAgg.c. The problem is, it just does not perform as well. I guess this is because when the sort is done inside nodeAgg.c that the transition function is called in a tight loop while reading records back from the tuplestore. In the patched version, there's an additional node transition in between nodeAgg and nodeSort and that causes slower performance. For now, I'm not quite sure what to do about that. We set the plan pathkeys well before we could possibly decide if asking for pre-sorted input for the aggregates would be a good idea or not. Please find attached my WIP patch. It's WIP due to what I mentioned in the above paragraph and also because I've not bothered to add JIT support for the new expression evaluation steps. I'll add this to the July commitfest. David [1] https://www.postgresql.org/message-id/CAKJS1f9sx_6GTcvd6TMuZnNtCh0VhBzhX6FZqw17TgVFH-ga_A%40mail.gmail.com [2] https://www.postgresql.org/message-id/flat/6538.1522096067%40sss.pgh.pa.us#c228ed67026faa15209c76dada035da4
Attachment
pgsql-hackers by date: