Re: [HACKERS] Partition-wise aggregation/grouping - Mailing list pgsql-hackers

From Jeevan Chalke
Subject Re: [HACKERS] Partition-wise aggregation/grouping
Date
Msg-id CAM2+6=Xidt0DBct7gvFhDNYSU+iFMW9EAQehs1Y-bZZHrXb6qg@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Partition-wise aggregation/grouping  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: [HACKERS] Partition-wise aggregation/grouping  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers


On Wed, Mar 21, 2018 at 7:46 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Mar 21, 2018 at 8:01 AM, Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:
>> In the patch as proposed, create_partial_grouping_paths() can get
>> called even if GROUPING_CAN_PARTIAL_AGG is not set.  I think that's
>> wrong.
>
> I don't think so. For parallel case, we do check that. And for partitionwise
> aggregation check, it was checked inside can_partitionwise_grouping()
> function and flags were set accordingly. Am I missing something?

Well, one of us is missing something somewhere.  If
GROUPING_CAN_PARTIAL_AGG means that we're allowed to do partial
grouping, and if create_partial_grouping_paths() is where partial
grouping happens, then we should only be calling the latter if the
former is set.  I mean, how can it make sense to create
partially-grouped paths if we're not allowed to do partial grouping?

Yes, that's true. If we are not allowed to do partial grouping, partially_grouped paths should not be created.

However, what I mean is that the partitionwise related checks added, if evaluates to true it implies that GROUPING_CAN_PARTIAL_AGG is also set as it was checked earlier. And thus does not need explicit check again.

Anyway, after your refactoring, it becomes more readable now.
 

> I have tweaked these conditions and posted in a separate patch (0006).
> However, I have merged all your three patches in one (0005).

OK, thanks.  I wasn't sure I had understood what was going on, so
thanks for checking it.

Thanks also for keeping 0004-0006 separate here, but I think you can
flatten them into one patch in the next version.

OK. Sure.
 

>> I think that if the last test in can_partitionwise_grouping were moved
>> before the previous test, it could be simplified to test only
>> (extra->flags & GROUPING_CAN_PARTIAL_AGG) == 0 and not
>> *perform_partial_partitionwise_aggregation.
>
> I think we can't do this way. If *perform_partial_partitionwise_aggregation
> found to be true then only we need to check whether partial aggregation
> itself is possible or not. If we are going to perform a full partitionwise
> aggregation then test for can_partial_agg is not needed. Have I misread your
> comments?

It seems you're correct, because when I change it the tests fail.  I
don't yet understand why.

Basically, the main patch seems to use three Boolean signaling mechanisms:

1. GROUPING_CAN_PARTITIONWISE_AGG
2. is_partial_aggregation
3. perform_partial_partitionwise_aggregation

Stuff I don't understand:

- Why is one of them a Boolean shoved into "flags", even though it's
not static across the whole hierarchy like the other flags, and the
other two are separate Booleans?
- What do they all do, anyway?

Let me try to explain this:

1. GROUPING_CAN_PARTITIONWISE_AGG
Tells us whether or not partitionwise grouping and/or aggregation is ever possible. If it is FALSE, other two have no meaning and they will be useless. However, if it is TRUE, then only we attempt to create paths partitionwise.
I have kept it in "flags" as it looks similar in behavior with other flag members like can_sort, can_hash etc. And, for given grouped relation whether parent or child, they all work similarly. But yes, for child relation, we inherit can_sort/can_hash from the parent as they won't change. But need to evaluate this for every child.
If required, I can move that to a GroupPathExtraData struct.

2. extra->is_partial_aggregation
This boolean var is used to identify at any given time whether we are computing a full aggregation or a partial aggregation. This boolean is necessary when doing partial aggregation to skip finalization. And also tells us to use partially_grouped_rel when true.

3. extra->perform_partial_partitionwise_aggregation
This boolean var is used to instruct child that it has to create a partially aggregated paths when TRUE. And then it transferred to child_extra->is_partial_aggregation in create_partitionwise_grouping_paths().

Basically (3) is required as we wanted to create a partially_grouped_rel upfront. So that if the child is going to create a partially aggregated paths, they can append those into the parent's partially grouped rel and thus we need to create that before even we enter into the child paths creation.
Since (3) is only valid if (1) is true, we need to compute (1) upfront too.
 

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



--
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: WIP: a way forward on bootstrap data
Next
From: Craig Ringer
Date:
Subject: Re: handling of heap rewrites in logical decoding