Home > mailing lists

Re: POC: GROUP BY optimization - Mailing list pgsql-hackers

From	Alexander Korotkov
Subject	Re: POC: GROUP BY optimization
Date	December 21, 2023 10:53:27
Msg-id	CAPpHfdtzaVa7S4onKy3YvttF2rrH5hQNHx9HtcSTLbpjx+MJ+w@mail.gmail.com Whole thread Raw
In response to	Re: POC: GROUP BY optimization (Andrei Lepikhov <a.lepikhov@postgrespro.ru>)
Responses	Re: POC: GROUP BY optimization Re: POC: GROUP BY optimization
List	pgsql-hackers

Tree view

Hi!

On Sun, Oct 1, 2023 at 11:45 AM Andrei Lepikhov
<a.lepikhov@postgrespro.ru> wrote:
>
> New version of the patch. Fixed minor inconsistencies and rebased onto
> current master.

Thank you (and other authors) for working on this subject.  Indeed to
GROUP BY clauses are order-agnostic.  Reordering them in the most
suitable order could give up significant query planning benefits.  I
went through the thread: I see significant work has been already made
on this patch, the code is quite polished.

I'd like to make some notes.

1) As already mentioned, there is clearly a repetitive pattern for the
code following after get_useful_group_keys_orderings() calls.  I think
it would be good to extract it into a separate function.  Please, do
this as a separate patch coming before the group-by patch. That would
simplify the review.

2) I wonder what planning overhead this patch could introduce?  Could
you try to measure the worst case?  What if we have a table with a lot
of indexes and a long list of group-by clauses partially patching
every index.  This should give us an understanding on whether we need
a separate GUC to control this feature.

3) I see that get_useful_group_keys_orderings() makes 3 calls to
get_cheapest_group_keys_order() function.  Each time
get_cheapest_group_keys_order() performs the cost estimate and
reorders the free keys.  However, cost estimation implies the system
catalog lookups (that is quite expensive).  I wonder if we could
change the algorithm.  Could we just sort the group-by keys by cost
once, save this ordering and then just re-use it.  So, every time we
need to reorder a group by, we can just pull the required keys to the
top and use saved ordering for the rest.  I also wonder if we could do
this once for add_paths_to_grouping_rel() and
create_partial_grouping_paths() calls.  So, it probably should be
somewhere in create_ordinary_grouping_paths().

4) I think we can do some optimizations when enable_incremental_sort
== off.  Then in get_useful_group_keys_orderings() we should only deal
with input_path fully matching the group-by clause, and try only full
match of group-by output to the required order.

------
Regards,
Alexander Korotkov

pgsql-hackers by date:

From: Pavel Stehule
Date: 21 December 2023, 10:46:29
Subject: Re: [DOC] Introducing Quick Start Guide to PL/pgSQL and PL/Python Documentation

From: Christoph Berg
Date: 21 December 2023, 10:56:22
Subject: int4->bool test coverage

Re: POC: GROUP BY optimization - Mailing list pgsql-hackers

Previous

Next