Re: POC: GROUP BY optimization - Mailing list pgsql-hackers
From | Andrei Lepikhov |
---|---|
Subject | Re: POC: GROUP BY optimization |
Date | |
Msg-id | 6c67b7bb-5469-475f-8988-0a50b4c64852@postgrespro.ru Whole thread Raw |
In response to | Re: POC: GROUP BY optimization (Alexander Korotkov <aekorotkov@gmail.com>) |
Responses |
Re: POC: GROUP BY optimization
|
List | pgsql-hackers |
On 21/12/2023 17:53, Alexander Korotkov wrote: > On Sun, Oct 1, 2023 at 11:45 AM Andrei Lepikhov > <a.lepikhov@postgrespro.ru> wrote: >> New version of the patch. Fixed minor inconsistencies and rebased onto >> current master. > Thank you (and other authors) for working on this subject. Indeed to > GROUP BY clauses are order-agnostic. Reordering them in the most > suitable order could give up significant query planning benefits. I > went through the thread: I see significant work has been already made > on this patch, the code is quite polished. Maybe, but issues, mentioned in [1], still not resolved. It is the only reason, why this thread hasn't been active. > I'd like to make some notes. > 1) As already mentioned, there is clearly a repetitive pattern for the > code following after get_useful_group_keys_orderings() calls. I think > it would be good to extract it into a separate function. Please, do > this as a separate patch coming before the group-by patch. That would > simplify the review. Yeah, these parts of code a bit different. I will try to make common routine. > 2) I wonder what planning overhead this patch could introduce? Could > you try to measure the worst case? What if we have a table with a lot > of indexes and a long list of group-by clauses partially patching > every index. This should give us an understanding on whether we need > a separate GUC to control this feature. Ok> 3) I see that get_useful_group_keys_orderings() makes 3 calls to > get_cheapest_group_keys_order() function. Each time > get_cheapest_group_keys_order() performs the cost estimate and > reorders the free keys. However, cost estimation implies the system > catalog lookups (that is quite expensive). I wonder if we could > change the algorithm. Could we just sort the group-by keys by cost > once, save this ordering and then just re-use it. So, every time we > need to reorder a group by, we can just pull the required keys to the > top and use saved ordering for the rest. I also wonder if we could do > this once for add_paths_to_grouping_rel() and > create_partial_grouping_paths() calls. So, it probably should be > somewhere in create_ordinary_grouping_paths(). Thanks for the idea!> 4) I think we can do some optimizations when enable_incremental_sort > == off. Then in get_useful_group_keys_orderings() we should only deal > with input_path fully matching the group-by clause, and try only full > match of group-by output to the required order. Oh, we had designed before the incremental sort was invented. Will see what we can do here. [1] https://www.postgresql.org/message-id/60610df1-c32f-ebdf-e58c-7a664431f452%40enterprisedb.com -- regards, Andrei Lepikhov Postgres Professional
pgsql-hackers by date: