Re: [HACKERS] Partition-wise aggregation/grouping - Mailing list pgsql-hackers

From Jeevan Chalke
Subject Re: [HACKERS] Partition-wise aggregation/grouping
Date
Msg-id CAM2+6=XV6KVhB5jFrxYK+gOvKRUjr8oM1rJrupycNx51RtCw+A@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Partition-wise aggregation/grouping  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: [HACKERS] Partition-wise aggregation/grouping
List pgsql-hackers


On Tue, Jan 16, 2018 at 3:41 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Jan 11, 2018 at 6:00 AM, Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:
>  Attached new set of patches adding this. Only patch 0007 (main patch) and
> 0008 (testcase patch) has changed.
>
> Please have a look and let me know if I missed any.

I spent a little time studying 0001 and 0002 today, as well as their
relation with 0007.  I find the way that the refactoring has been done
slightly odd.  With 0001 and 0002 applied, we end up with three
functions for creating aggregate paths: create_partial_agg_path, which
handles the partial-path case for both sort and hash;
create_sort_agg_path, which handles the sort case for non-partial
paths only; and create_hash_agg_path, which handles the hash case for
non-partial paths only.  This leads to the following code in 0007:

+               /* Full grouping paths */
+
+               if (try_parallel_aggregation)
+               {
+                       Assert(extra->agg_partial_costs &&
extra->agg_final_costs);
+                       create_partial_agg_path(root, input_rel,
grouped_rel, target,
+
 partial_target, extra->agg_partial_costs,
+
 extra->agg_final_costs, gd, can_sort,
+
 can_hash, (List *) extra->havingQual);
+               }
+
+               if (can_sort)
+                       create_sort_agg_path(root, input_rel,
grouped_rel, target,
+
partial_target, agg_costs,
+
extra->agg_final_costs, gd, can_hash,
+
dNumGroups, (List *) extra->havingQual);
+
+               if (can_hash)
+                       create_hash_agg_path(root, input_rel,
grouped_rel, target,
+
partial_target, agg_costs,
+
extra->agg_final_costs, gd, dNumGroups,
+                                                                (List
*) extra->havingQual);

That looks strange -- you would expect to see either "sort" and "hash"
cases here, or maybe "partial" and "non-partial", or maybe all four
combinations, but seeing three things here looks surprising.  I think
the solution is just to create a single function that does both the
work of create_sort_agg_path and the work of create_hash_agg_path
instead of having two separate functions.

In existing code (in create_grouping_paths()), I see following pattern:
    if (try_parallel_aggregation)
        if (can_sort)
        if (can_hash)
    if (can_sort)
    if (can_hash)

And thus, I have created three functions to match with existing pattern.

I will make your suggested changes that is merge create_sort_agg_path() and
create_hash_agg_path(). Will name that function as
create_sort_and_hash_agg_paths().
 

A related thing that is also surprising is that 0007 manages to reuse
create_partial_agg_path for both the isPartialAgg and non-isPartialAgg
cases -- in fact, the calls appear to be identical, and could be
hoisted out of the "if" statement

Yes. We can do that as well and I think it is better too.
I was just trying to preserve the existing pattern. So for PWA I chose:
    if partialAgg
        if (try_parallel_aggregation)
            if (can_sort)
            if (can_hash)
        if (can_sort)
        if (can_hash)
    else fullAgg
        if (try_parallel_aggregation)
            if (can_sort)
            if (can_hash)
        if (can_sort)
        if (can_hash)

But since, if (try_parallel_aggregation) case is exactly same, I will pull
that out of if..else.

 
-- but create_sort_agg_path and
create_hash_agg_path do not get reused.  I think you should see
whether you can define the new combo function that can be used for
both cases.  The logic looks very similar, and I'm wondering why it
isn't more similar than it is; for instance, create_sort_agg_path
loops over the input rel's pathlist, but the code for
isPartialAgg/can_sort seems to consider only the cheapest path.  If
this is correct, it needs a comment explaining it, but I don't see why
it should be correct.

Oops. My mistake. Missed. We should loop over the input rel's pathlist.

Yep. With above change, the logic is very similar except
(1) isPartialAgg/can_sort case creates the partial paths and
(2) finalization step is not needed at this stage.

I think it can be done by passing a flag to create_sort_agg_path() (or new
combo function) and making appropriate adjustments. Do you think addition of
this new flag should go in re-factoring patch or main PWA patch?
I think re-factoring patch.
 

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



--
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

pgsql-hackers by date:

Previous
From: "REIX, Tony"
Date:
Subject: RE:[HACKERS] Deadlock in XLogInsert at AIX
Next
From: Kyotaro HORIGUCHI
Date:
Subject: Re: [HACKERS] Restricting maximum keep segments by repslots