Re: POC: GROUP BY optimization - Mailing list pgsql-hackers
From | Andrei Lepikhov |
---|---|
Subject | Re: POC: GROUP BY optimization |
Date | |
Msg-id | f7a8729d-86d4-43ad-8b68-a7a3ace9d0fc@postgrespro.ru Whole thread Raw |
In response to | Re: POC: GROUP BY optimization (Alexander Korotkov <aekorotkov@gmail.com>) |
Responses |
Re: POC: GROUP BY optimization
Re: POC: GROUP BY optimization |
List | pgsql-hackers |
On 16/1/2024 22:05, Alexander Korotkov wrote: > On Tue, Jan 16, 2024 at 4:48 AM Richard Guo <guofenglinux@gmail.com> wrote: >> * When trying to match the ordering of GROUP BY to that of ORDER BY in >> get_useful_group_keys_orderings, this patch checks against the length of >> path->pathkeys. This does not make sense. I guess this is a typo and >> the real intention is to check against root->sort_pathkeys. Thanks! It is really my blunder - fresh look works. >> >> --- a/src/backend/optimizer/path/pathkeys.c >> +++ b/src/backend/optimizer/path/pathkeys.c >> @@ -504,7 +504,7 @@ get_useful_group_keys_orderings(PlannerInfo *root, Path *path) >> root->num_groupby_pathkeys); >> >> if (n > 0 && >> - (enable_incremental_sort || n == list_length(path->pathkeys))) >> + (enable_incremental_sort || n == list_length(root->sort_pathkeys))) >> >> However, I think this is also incorrect. When incremental sort is >> disabled, it is reasonable to consider reordering the GROUP BY keys only >> if the number of matching pathkeys is equal to the length of >> root->group_pathkeys i.e. if 'n == list_length(root->group_pathkeys)'. > > Hmm... Why should this be list_length(root->group_pathkeys) while > we're targeting root->sort_pathkeys. I yet changed that to > list_length(root->sort_pathkeys). I think, in the first case, when we are trying to arrange group-by keys according to underlying pathkeys with incremental sort = off, it makes sense to do if we fetch all group-by keys regardless of a more or equal number of path keys the incoming path contains. The code and test case are in step1.txt. > >> * When the original ordering of GROUP BY keys matches the path's >> pathkeys or ORDER BY keys, get_useful_group_keys_orderings would >> generate duplicate PathKeyInfos. Consequently, this duplication would >> lead to the creation and addition of identical paths multiple times. >> This is not great. Consider the query below: >> >> create table t (a int, b int); >> create index on t (a, b); >> set enable_hashagg to off; >> >> explain select count(*) from t group by a, b; >> QUERY PLAN >> ---------------------------------------------------------------------------------- >> GroupAggregate (cost=0.15..97.27 rows=226 width=16) >> Group Key: a, b >> -> Index Only Scan using t_a_b_idx on t (cost=0.15..78.06 rows=2260 width=8) >> (3 rows) >> >> The same path with group keys {a, b} is created and added twice. > > I tried to address that. > >> * Part of the work performed in this patch overlaps with that of >> preprocess_groupclause. They are both trying to adjust the ordering of >> the GROUP BY keys to match ORDER BY. I wonder if it would be better to >> perform this work only once. > > Andrei, could you take a look. As I see, the PathKeyInfo list often contains duplicated pathkeys, coming from either sort_pathkeys or path->pathkeys orderings. So, I can propose to check duplicates each time (see step2.txt in attachment). > >> * When reordering the GROUP BY keys, I think the following checks need >> some improvements. >> >> + /* >> + * Since 1349d27 pathkey coming from underlying node can be in the >> + * root->group_pathkeys but not in the processed_groupClause. So, we >> + * should be careful here. >> + */ >> + sgc = get_sortgroupref_clause_noerr(pathkey->pk_eclass->ec_sortref, >> + *group_clauses); >> + if (!sgc) >> + /* The grouping clause does not cover this pathkey */ >> + break; >> >> I think we need to check or assert 'pathkey->pk_eclass->ec_sortref' is >> valid before calling get_sortgroupref_clause_noerr with it. Also, how >> can we guarantee that the returned GROUP BY item is sortable? Should we >> check that with OidIsValid(sgc->sortop)? > > Added. Reviewed it, looks good. -- regards, Andrei Lepikhov Postgres Professional
Attachment
pgsql-hackers by date: