Home > mailing lists

Re: POC: GROUP BY optimization - Mailing list pgsql-hackers

From	Andrei Lepikhov
Subject	Re: POC: GROUP BY optimization
Date	January 18, 2024 09:48:54
Msg-id	f7a8729d-86d4-43ad-8b68-a7a3ace9d0fc@postgrespro.ru Whole thread
In response to	Re: POC: GROUP BY optimization (Alexander Korotkov <aekorotkov@gmail.com>)
Responses	Re: POC: GROUP BY optimization Re: POC: GROUP BY optimization
List	pgsql-hackers

Tree view

On 16/1/2024 22:05, Alexander Korotkov wrote:
> On Tue, Jan 16, 2024 at 4:48 AM Richard Guo <guofenglinux@gmail.com> wrote:
>> * When trying to match the ordering of GROUP BY to that of ORDER BY in
>> get_useful_group_keys_orderings, this patch checks against the length of
>> path->pathkeys.  This does not make sense.  I guess this is a typo and
>> the real intention is to check against root->sort_pathkeys.
Thanks! It is really my blunder - fresh look works.
>>
>> --- a/src/backend/optimizer/path/pathkeys.c
>> +++ b/src/backend/optimizer/path/pathkeys.c
>> @@ -504,7 +504,7 @@ get_useful_group_keys_orderings(PlannerInfo *root, Path *path)
>>                                             root->num_groupby_pathkeys);
>>
>>          if (n > 0 &&
>> -           (enable_incremental_sort || n == list_length(path->pathkeys)))
>> +           (enable_incremental_sort || n == list_length(root->sort_pathkeys)))
>>
>> However, I think this is also incorrect.  When incremental sort is
>> disabled, it is reasonable to consider reordering the GROUP BY keys only
>> if the number of matching pathkeys is equal to the length of
>> root->group_pathkeys i.e. if 'n == list_length(root->group_pathkeys)'.
> 
> Hmm... Why should this be list_length(root->group_pathkeys) while
> we're targeting root->sort_pathkeys.  I yet changed that to
> list_length(root->sort_pathkeys).
I think, in the first case, when we are trying to arrange group-by keys 
according to underlying pathkeys with incremental sort = off, it makes 
sense to do if we fetch all group-by keys regardless of a more or equal 
number of path keys the incoming path contains. The code and test case 
are in step1.txt.
> 
>> * When the original ordering of GROUP BY keys matches the path's
>> pathkeys or ORDER BY keys, get_useful_group_keys_orderings would
>> generate duplicate PathKeyInfos.  Consequently, this duplication would
>> lead to the creation and addition of identical paths multiple times.
>> This is not great.  Consider the query below:
>>
>> create table t (a int, b int);
>> create index on t (a, b);
>> set enable_hashagg to off;
>>
>> explain select count(*) from t group by a, b;
>>                                      QUERY PLAN
>> ----------------------------------------------------------------------------------
>>   GroupAggregate  (cost=0.15..97.27 rows=226 width=16)
>>     Group Key: a, b
>>     ->  Index Only Scan using t_a_b_idx on t  (cost=0.15..78.06 rows=2260 width=8)
>> (3 rows)
>>
>> The same path with group keys {a, b} is created and added twice.
> 
> I tried to address that.
> 
>> * Part of the work performed in this patch overlaps with that of
>> preprocess_groupclause.  They are both trying to adjust the ordering of
>> the GROUP BY keys to match ORDER BY.  I wonder if it would be better to
>> perform this work only once.
> 
> Andrei, could you take a look.
As I see, the PathKeyInfo list often contains duplicated pathkeys, 
coming from either sort_pathkeys or path->pathkeys orderings. So, I can 
propose to check duplicates each time (see step2.txt in attachment).
> 
>> * When reordering the GROUP BY keys, I think the following checks need
>> some improvements.
>>
>> +       /*
>> +        * Since 1349d27 pathkey coming from underlying node can be in the
>> +        * root->group_pathkeys but not in the processed_groupClause. So, we
>> +        * should be careful here.
>> +        */
>> +       sgc = get_sortgroupref_clause_noerr(pathkey->pk_eclass->ec_sortref,
>> +                                           *group_clauses);
>> +       if (!sgc)
>> +           /* The grouping clause does not cover this pathkey */
>> +           break;
>>
>> I think we need to check or assert 'pathkey->pk_eclass->ec_sortref' is
>> valid before calling get_sortgroupref_clause_noerr with it.  Also, how
>> can we guarantee that the returned GROUP BY item is sortable?  Should we
>> check that with OidIsValid(sgc->sortop)?
> 
> Added.
Reviewed it, looks good.

-- 
regards,
Andrei Lepikhov
Postgres Professional

Attachment

step1.txt

pgsql-hackers by date:

From: Amit Kapila
Date: 18 January 2024, 09:43:56
Subject: Re: speed up a logical replica setup

From: Matthias van de Meent
Date: 18 January 2024, 09:49:48
Subject: Re: cleanup patches for incremental backup

Re: POC: GROUP BY optimization - Mailing list pgsql-hackers

Attachment

Previous

Next