Re: POC: GROUP BY optimization - Mailing list pgsql-hackers

From Alexander Korotkov
Subject Re: POC: GROUP BY optimization
Date
Msg-id CAPpHfdsfHn89cAFEa6-6Xm1sB_xf8Rf2WNab6DQzb8Rvs0yD8g@mail.gmail.com
Whole thread Raw
In response to Re: POC: GROUP BY optimization  (Richard Guo <guofenglinux@gmail.com>)
Responses Re: POC: GROUP BY optimization
List pgsql-hackers
Hi!

Thank you for your review.  The revised patchset is attached.

On Tue, Jan 16, 2024 at 4:48 AM Richard Guo <guofenglinux@gmail.com> wrote:
> On Mon, Jan 15, 2024 at 3:56 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
>>
>> On Mon, Jan 15, 2024 at 8:42 AM Richard Guo <guofenglinux@gmail.com> wrote:
>> > On Mon, Jan 15, 2024 at 8:20 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:
>> >>
>> >> Thank you for providing the test case relevant for this code change.
>> >> The revised patch incorporating this change is attached.  Now the
>> >> patchset looks good to me.  I'm going to push it if there are no
>> >> objections.
>> >
>> > Seems I'm late for the party.  Can we hold for several more days?  I'd
>> > like to have a review on this patch.
>>
>> Sure, NP.  I'll hold it till your review.
>
>
> Thanks.  Appreciate that.
>
> I have briefly reviewed this patch and here are some comments.
>
> * When trying to match the ordering of GROUP BY to that of ORDER BY in
> get_useful_group_keys_orderings, this patch checks against the length of
> path->pathkeys.  This does not make sense.  I guess this is a typo and
> the real intention is to check against root->sort_pathkeys.
>
> --- a/src/backend/optimizer/path/pathkeys.c
> +++ b/src/backend/optimizer/path/pathkeys.c
> @@ -504,7 +504,7 @@ get_useful_group_keys_orderings(PlannerInfo *root, Path *path)
>                                            root->num_groupby_pathkeys);
>
>         if (n > 0 &&
> -           (enable_incremental_sort || n == list_length(path->pathkeys)))
> +           (enable_incremental_sort || n == list_length(root->sort_pathkeys)))
>
> However, I think this is also incorrect.  When incremental sort is
> disabled, it is reasonable to consider reordering the GROUP BY keys only
> if the number of matching pathkeys is equal to the length of
> root->group_pathkeys i.e. if 'n == list_length(root->group_pathkeys)'.

Hmm... Why should this be list_length(root->group_pathkeys) while
we're targeting root->sort_pathkeys.  I yet changed that to
list_length(root->sort_pathkeys).

> * When the original ordering of GROUP BY keys matches the path's
> pathkeys or ORDER BY keys, get_useful_group_keys_orderings would
> generate duplicate PathKeyInfos.  Consequently, this duplication would
> lead to the creation and addition of identical paths multiple times.
> This is not great.  Consider the query below:
>
> create table t (a int, b int);
> create index on t (a, b);
> set enable_hashagg to off;
>
> explain select count(*) from t group by a, b;
>                                     QUERY PLAN
> ----------------------------------------------------------------------------------
>  GroupAggregate  (cost=0.15..97.27 rows=226 width=16)
>    Group Key: a, b
>    ->  Index Only Scan using t_a_b_idx on t  (cost=0.15..78.06 rows=2260 width=8)
> (3 rows)
>
> The same path with group keys {a, b} is created and added twice.

I tried to address that.

> * Part of the work performed in this patch overlaps with that of
> preprocess_groupclause.  They are both trying to adjust the ordering of
> the GROUP BY keys to match ORDER BY.  I wonder if it would be better to
> perform this work only once.

Andrei, could you take a look.

> * When reordering the GROUP BY keys, I think the following checks need
> some improvements.
>
> +       /*
> +        * Since 1349d27 pathkey coming from underlying node can be in the
> +        * root->group_pathkeys but not in the processed_groupClause. So, we
> +        * should be careful here.
> +        */
> +       sgc = get_sortgroupref_clause_noerr(pathkey->pk_eclass->ec_sortref,
> +                                           *group_clauses);
> +       if (!sgc)
> +           /* The grouping clause does not cover this pathkey */
> +           break;
>
> I think we need to check or assert 'pathkey->pk_eclass->ec_sortref' is
> valid before calling get_sortgroupref_clause_noerr with it.  Also, how
> can we guarantee that the returned GROUP BY item is sortable?  Should we
> check that with OidIsValid(sgc->sortop)?

Added.

------
Regards,
Alexander Korotkov

Attachment

pgsql-hackers by date:

Previous
From: Ashutosh Bapat
Date:
Subject: Re: Report planning memory in EXPLAIN ANALYZE
Next
From: Alexander Korotkov
Date:
Subject: Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)