Re: POC: GROUP BY optimization - Mailing list pgsql-hackers
From | Alexander Korotkov |
---|---|
Subject | Re: POC: GROUP BY optimization |
Date | |
Msg-id | CAPpHfdsfHn89cAFEa6-6Xm1sB_xf8Rf2WNab6DQzb8Rvs0yD8g@mail.gmail.com Whole thread Raw |
In response to | Re: POC: GROUP BY optimization (Richard Guo <guofenglinux@gmail.com>) |
Responses |
Re: POC: GROUP BY optimization
|
List | pgsql-hackers |
Hi! Thank you for your review. The revised patchset is attached. On Tue, Jan 16, 2024 at 4:48 AM Richard Guo <guofenglinux@gmail.com> wrote: > On Mon, Jan 15, 2024 at 3:56 PM Alexander Korotkov <aekorotkov@gmail.com> wrote: >> >> On Mon, Jan 15, 2024 at 8:42 AM Richard Guo <guofenglinux@gmail.com> wrote: >> > On Mon, Jan 15, 2024 at 8:20 AM Alexander Korotkov <aekorotkov@gmail.com> wrote: >> >> >> >> Thank you for providing the test case relevant for this code change. >> >> The revised patch incorporating this change is attached. Now the >> >> patchset looks good to me. I'm going to push it if there are no >> >> objections. >> > >> > Seems I'm late for the party. Can we hold for several more days? I'd >> > like to have a review on this patch. >> >> Sure, NP. I'll hold it till your review. > > > Thanks. Appreciate that. > > I have briefly reviewed this patch and here are some comments. > > * When trying to match the ordering of GROUP BY to that of ORDER BY in > get_useful_group_keys_orderings, this patch checks against the length of > path->pathkeys. This does not make sense. I guess this is a typo and > the real intention is to check against root->sort_pathkeys. > > --- a/src/backend/optimizer/path/pathkeys.c > +++ b/src/backend/optimizer/path/pathkeys.c > @@ -504,7 +504,7 @@ get_useful_group_keys_orderings(PlannerInfo *root, Path *path) > root->num_groupby_pathkeys); > > if (n > 0 && > - (enable_incremental_sort || n == list_length(path->pathkeys))) > + (enable_incremental_sort || n == list_length(root->sort_pathkeys))) > > However, I think this is also incorrect. When incremental sort is > disabled, it is reasonable to consider reordering the GROUP BY keys only > if the number of matching pathkeys is equal to the length of > root->group_pathkeys i.e. if 'n == list_length(root->group_pathkeys)'. Hmm... Why should this be list_length(root->group_pathkeys) while we're targeting root->sort_pathkeys. I yet changed that to list_length(root->sort_pathkeys). > * When the original ordering of GROUP BY keys matches the path's > pathkeys or ORDER BY keys, get_useful_group_keys_orderings would > generate duplicate PathKeyInfos. Consequently, this duplication would > lead to the creation and addition of identical paths multiple times. > This is not great. Consider the query below: > > create table t (a int, b int); > create index on t (a, b); > set enable_hashagg to off; > > explain select count(*) from t group by a, b; > QUERY PLAN > ---------------------------------------------------------------------------------- > GroupAggregate (cost=0.15..97.27 rows=226 width=16) > Group Key: a, b > -> Index Only Scan using t_a_b_idx on t (cost=0.15..78.06 rows=2260 width=8) > (3 rows) > > The same path with group keys {a, b} is created and added twice. I tried to address that. > * Part of the work performed in this patch overlaps with that of > preprocess_groupclause. They are both trying to adjust the ordering of > the GROUP BY keys to match ORDER BY. I wonder if it would be better to > perform this work only once. Andrei, could you take a look. > * When reordering the GROUP BY keys, I think the following checks need > some improvements. > > + /* > + * Since 1349d27 pathkey coming from underlying node can be in the > + * root->group_pathkeys but not in the processed_groupClause. So, we > + * should be careful here. > + */ > + sgc = get_sortgroupref_clause_noerr(pathkey->pk_eclass->ec_sortref, > + *group_clauses); > + if (!sgc) > + /* The grouping clause does not cover this pathkey */ > + break; > > I think we need to check or assert 'pathkey->pk_eclass->ec_sortref' is > valid before calling get_sortgroupref_clause_noerr with it. Also, how > can we guarantee that the returned GROUP BY item is sortable? Should we > check that with OidIsValid(sgc->sortop)? Added. ------ Regards, Alexander Korotkov
Attachment
pgsql-hackers by date: