Re: [HACKERS] Partition-wise aggregation/grouping - Mailing list pgsql-hackers
From | Ashutosh Bapat |
---|---|
Subject | Re: [HACKERS] Partition-wise aggregation/grouping |
Date | |
Msg-id | CAFjFpReiaKX516dYv67-YXpG_os0w-V_kPBvMz0H7URyLnQ7yw@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] Partition-wise aggregation/grouping (Jeevan Chalke <jeevan.chalke@enterprisedb.com>) |
Responses |
Re: [HACKERS] Partition-wise aggregation/grouping
(Robert Haas <robertmhaas@gmail.com>)
Re: [HACKERS] Partition-wise aggregation/grouping (Jeevan Chalke <jeevan.chalke@enterprisedb.com>) |
List | pgsql-hackers |
On Wed, Mar 7, 2018 at 8:07 PM, Jeevan Chalke <jeevan.chalke@enterprisedb.com> wrote: Here are some more review comments esp. on try_partitionwise_grouping() function. BTW name of that function doesn't go in sync with enable_partitionwise_aggregation (which btw is in sync with enable_fooagg GUCs). But it goes in sync with create_grouping_paths(). It looks like we have already confused aggregation and grouping e.g. enable_hashagg may affect path creation when there is no aggregation involved i.e. only grouping is involved but create_grouping_paths will create paths when there is no grouping involved. Generally it looks like the function names use grouping to mean both aggregation and grouping but GUCs use aggregation to mean both of those. So, the naming in this patch looks consistent with the current naming conventions. + grouped_rel->part_scheme = input_rel->part_scheme; + grouped_rel->nparts = nparts; + grouped_rel->boundinfo = input_rel->boundinfo; + grouped_rel->part_rels = part_rels; You need to set the part_exprs which will provide partition keys for this partitioned relation. I think, we should include all the part_exprs of input_rel which are part of GROUP BY clause. Since any other expressions in part_exprs are not part of GROUP BY clause, they can not appear in the targetlist without an aggregate on top. So they can't be part of the partition keys of the grouped relation. In create_grouping_paths() we fetch both partial as well as fully grouped rel for given input relation. But in case of partial aggregation, we don't need fully grouped rel since we are not computing full aggregates for the children. Since fetch_upper_rel() creates a relation when one doesn't exist, we are unnecessarily creating fully grouped rels in this case. For thousands of partitions that's a lot of memory wasted. I see a similar issue with create_grouping_paths() when we are computing only full aggregates (either because partial aggregation is not possible or because parallelism is not possible). In that case, we unconditionally create partially grouped rels. That too would waste a lot of memory. I think that partially_grouped_rel, when required, is partitioned irrespective of whether we do full aggregation per partition or not. So, if we have its part_rels and other partitioning properties set. I agree that right now we won't use this information anywhere. It may be useful, in case we device a way to use partially_grouped_rel directly without using grouped_rel for planning beyond grouping, which seems unlikely. + + /* + * Parallel aggregation requires partial target, so compute it here + * and translate all vars. For partial aggregation, we need it + * anyways. + */ + partial_target = make_partial_grouping_target(root, target, + havingQual); Don't we have this available in partially_grouped_rel? That shows one asymmetry that Robert's refactoring has introduced. We don't set reltarget of grouped_rel but set reltarget of partially_grouped_rel. If reltarget of grouped_rel changes across paths (the reason why we have not set it in grouped_rel), shouldn't reltarget of partially grouped paths change accordingly? + +/* + * group_by_has_partkey + * + * Returns true, if all the partition keys of the given relation are part of + * the GROUP BY clauses, false otherwise. + */ +static bool +group_by_has_partkey(RelOptInfo *input_rel, PathTarget *target, + List *groupClause) We could modify this function to return the list of part_exprs which are part of group clause and use that as the partition keys of the grouped_rel if required. If group by doesn't have all the partition keys, the function would return a NULL list. Right now, in case of full aggregation, partially_grouped_rel is populated with the partial paths created by adding partial aggregation to the partial paths of input relation. But we are not trying to create partial paths by (parallel) appending the (non)partial paths from the child partially_grouped_rel. Have we thought about that? Would such paths have different shapes from the ones that we create now and will they be better? -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
pgsql-hackers by date: