Re: [HACKERS] Partition-wise aggregation/grouping - Mailing list pgsql-hackers

From Jeevan Chalke
Subject Re: [HACKERS] Partition-wise aggregation/grouping
Date
Msg-id CAM2+6=UF7knvAUed59G-4HWJC+WewooRe23x5Egp1Sm7Poi4WQ@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Partition-wise aggregation/grouping  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: [HACKERS] Partition-wise aggregation/grouping
List pgsql-hackers


On Wed, Jan 17, 2018 at 1:18 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Jan 16, 2018 at 3:56 AM, Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:
> I will make your suggested changes that is merge create_sort_agg_path() and
> create_hash_agg_path(). Will name that function as
> create_sort_and_hash_agg_paths().

I suggest add_paths_to_grouping_rel() and
add_partial_paths_to_grouping_rel(), similar to what commit
c44c47a773bd9073012935a29b0264d95920412c did with
add_paths_to_append_rel().

> Oops. My mistake. Missed. We should loop over the input rel's pathlist.
>
> Yep. With above change, the logic is very similar except
> (1) isPartialAgg/can_sort case creates the partial paths and
> (2) finalization step is not needed at this stage.

I'm not sure what you mean by #1.

I mean, in case of isPartialAgg=true, we need to create a partial aggregation path which has aggsplit=AGGSPLIT_INITIAL_SERIAL and should not perform finalization at this stage. And thus add_paths_to_grouping_rel() needs a flag to diffrentiate it. By adding that code chunk allows us to reuse same function in both the cases i.e full and partial aggregation.

Attached patch with other review points fixed.


> I think it can be done by passing a flag to create_sort_agg_path() (or new
> combo function) and making appropriate adjustments. Do you think addition of
> this new flag should go in re-factoring patch or main PWA patch?
> I think re-factoring patch.

I think the refactoring patch should move the existing code into a new
function without any changes, and then the main patch should add an
additional argument to that function that allows for either behavior.

By the way, I'm also a bit concerned about this:

+               /*
+                * For full aggregation, we are done with the partial
paths.  Just
+                * clear it out so that we don't try to create a
parallel plan over it.
+                */
+               grouped_rel->partial_pathlist = NIL;

I think that's being done for the same reason as mentioned at the
bottom of the current code for create_grouping_paths().  They are only
partially aggregated and wouldn't produce correct final results if
some other planning step -- create_ordered_paths, or the code that
sets up final_rel -- used them as if they had been fully agggregated.
I'm worried that there might be an analogous danger for partition-wise
aggregation -- that is, that the paths being inserted into the partial
pathlists of the aggregate child rels might get reused by some later
planning step which doesn't realize that the output they produce
doesn't quite match up with the rel to which they are attached.  You
may have already taken care of that problem somehow, but we should
make sure that it's fully correct and clearly commented.  I don't
immediately see why the isPartialAgg case should be any different from
the !isPartialAgg case.

Actually I needed this because in case of full aggregation we already build all final aggregation paths before performing Append operation. However, when we do append in add_paths_to_append_rel(), it thinks that partial_pathlist present in grouped_rel is finalized one exactly like you mentioned above and thus we need to clear it out.

But yes, for safer side, I think once we done with partition-wise aggregation step, we need to again go through the partitioning chain and need to clear out all child grouped rel's partial_pathlist for the reason mentioned at the bottom of the current code for create_grouping_paths().
 

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



--
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Attachment

pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: [HACKERS] REL9_6_STABLE - a minor bug in src/common/exec.c
Next
From: Anastasia Lubennikova
Date:
Subject: Re: [HACKERS] WIP: Covering + unique indexes.