Home > mailing lists

Re: [HACKERS] Partition-wise aggregation/grouping - Mailing list pgsql-hackers

From	Jeevan Chalke
Subject	Re: [HACKERS] Partition-wise aggregation/grouping
Date	March 21, 2018 18:01:28
Msg-id	CAM2+6=VSC-TvUxSOJ62UCTAgTWv2JvmyeBafLLzpfvNg=yi_0Q@mail.gmail.com Whole thread Raw
In response to	Re: [HACKERS] Partition-wise aggregation/grouping (Robert Haas <robertmhaas@gmail.com>)
Responses	Re: [HACKERS] Partition-wise aggregation/grouping (Robert Haas <robertmhaas@gmail.com>)
List	pgsql-hackers

Tree view

On Wed, Mar 21, 2018 at 2:04 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Mar 20, 2018 at 10:46 AM, Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:
> I have added all these three patches in the attached patch-set and rebased
> my changes over it.
>
> However, I have not yet made this patch-set dependednt on UPPERREL_TLIST
> changes you have proposed on another mail-thread and thus it has 0001 patch
> refactoring the scanjoin issue.
> 0002, 0003 and 0004 are your patches added in this patchset.
> 0005 and 0006 are further refactoring patches. 0006 adds a
> GroupPathExtraData which stores mostly child variant data and costs.
> 0007 is main partitionwise aggregation patch which is then rebased
> accordingly.
> 0008 contains testcase and 0009 contains FDW changes.

Committed my refactoring patches (your 0002-0004).

Thanks Robert.

Regarding apply_scanjoin_target_to_paths in 0001 and 0007, it seems
like what happens is: we first build an Append path for the topmost
scan/join rel. That uses paths from the individual relations that
don't necessarily produce the final scan/join target. Then we mutate
those relations in place during partition-wise aggregate so that they
now do produce the final scan/join target and generate some more paths
using the results. So there's an ordering dependency, and the same
pathlist represents different things at different times. That is, I
suppose, not technically any worse than what we're doing for the
scan/join rel's pathlist in general, but here there's the additional
complexity that the paths get used both before and after being
mutated. The UPPERREL_TLIST proposal would clean this up, although I
realize that has unresolved issues.

In create_partial_grouping_paths, the loop that does "for (i = 0; i <
2; i++)" is not exactly what I had in mind when I said that we should
use two loops. I did not mean a loop with two iterations. I meant
adding a loop like foreach(lc, input_rel->pathlist) in each place
where we currently have a loop like
foreach(input_rel->partial_pathlist).

The path creation logic for partial_pathlist and pathlist was identical and thus I thought of just loop over it twice switching the pathlist, so that we have minimal code to maintain. But yes I agree that it adds additional complexity.

See 0001, attached.

Looks great. Thanks.

Don't write if (a) Assert(b) but rather Assert(!a || b). See 0002, attached.

OK. Noted.

In the patch as proposed, create_partial_grouping_paths() can get
called even if GROUPING_CAN_PARTIAL_AGG is not set. I think that's
wrong.

I don't think so. For parallel case, we do check that. And for partitionwise aggregation check, it was checked inside can_partitionwise_grouping() function and flags were set accordingly. Am I missing something?

If can_partial_agg() isn't accurately determining whether
partial aggregation is possible,

I think it does accurately determine.
if (!parse->hasAggs && parse->groupClause == NIL)
is only valid for DISTINCT queries which we are anyway not handling here and for partitionwise aggregate it won't be true otherwise it will be a degenerate grouping case.

and as Ashutosh and I have been
discussing, there's room for improvement in that area, then that's a
topic for some other set of patches. Also, the test in
create_ordinary_grouping_paths for whether or not to call
create_partial_grouping_paths() is super-complicated and uncommented.
I think a simpler approach is to allow create_partial_grouping_paths()
the option of returning NULL. See 0003, attached.

Thanks for simplifying it.

However, after this simplification, we were unnecessary creating non-parallel partial aggregation paths for the root input rel when not needed.
Consider a case where we need a partial aggregation from a child, in this case, extra->is_partial_aggregation = 0 at root level entry as the parent is still doing full aggregation but perform_partial_partitionwise_aggregation is true, which tells a child to perform partial partitionwise aggregation. In this case, cheapest_total_path will be set and thus we will go ahead and create partial aggregate paths for the parent rel, which is not needed.

I have tweaked these conditions and posted in a separate patch (0006).
However, I have merged all your three patches in one (0005).

make_grouping_rel() claims that "For now, all aggregated paths are
added to the (GROUP_AGG, NULL) upperrel", but this is false: we no
longer have only one grouped upper rel.

Done.

I'm having a heck of a time understanding what is_partial_aggregation
and perform_partial_partitionwise_aggregation are supposed to be
doing.

As you said correctly, is_partial_aggregation denotes that we are doing ONLY a partial aggregation at this level of partitioning hierarchy whereas perform_partial_partitionwise_aggregation is used to instruct the child whether it should perform partial or full aggregation at its level. Since we need to create a partially_grouped_rel we evaluate all these possibilities and thus need to pass those to the child so that child will not need to compute it again.

It seems like is_partial_aggregation means that we should ONLY
do partial aggregation, which is not exactly what the name implies.

I think it says we are doing a partial aggregation and thus implies to use partially_grouped_rel and skip finalization step.

It also seems like perform_partial_partitionwise_aggregation and
is_partial_aggregation differ only in that they apply to the current
level and to the child level respectively;

It's the other way. is_partial_aggregation is applied to the current level and perform_partial_partitionwise_aggregation applies to child level.

can't we merge these
somehow so that we don't need both of them?

I think we can't as they apply at two different levels. A scenario in which we are doing full aggregation at level 1 and need to perform partial aggregation at level 2, they are different. But yes, they both will be same if both the levels are doing same. But can't merge those.

Do you think any better names as it seems confusing?

I think that if the last test in can_partitionwise_grouping were moved
before the previous test, it could be simplified to test only
(extra->flags & GROUPING_CAN_PARTIAL_AGG) == 0 and not
*perform_partial_partitionwise_aggregation.

I think we can't do this way. If *perform_partial_partitionwise_aggregation found to be true then only we need to check whether partial aggregation itself is possible or not. If we are going to perform a full partitionwise aggregation then test for can_partial_agg is not needed. Have I misread your comments?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Attachment

partition-wise-agg-v22.tar.gz

pgsql-hackers by date:

From: Peter Eisentraut
Date: 21 March 2018, 17:48:02
Subject: Re: [HACKERS] taking stdbool.h into use

From: Michael Banck
Date: 21 March 2018, 18:12:45
Subject: pg_basebackup: Missing newlines in some error messages

Re: [HACKERS] Partition-wise aggregation/grouping - Mailing list pgsql-hackers

Attachment

Previous

Next