Re: [HACKERS] Partition-wise aggregation/grouping - Mailing list pgsql-hackers

From Robert Haas
Subject Re: [HACKERS] Partition-wise aggregation/grouping
Date
Msg-id CA+TgmoZ+ZJTVad-=vEq393N99KTooxv9k7M+z73qnTAqkb49BQ@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Partition-wise aggregation/grouping  (Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>)
Responses Re: [HACKERS] Partition-wise aggregation/grouping  (Jeevan Chalke <jeevan.chalke@enterprisedb.com>)
List pgsql-hackers
On Fri, Mar 16, 2018 at 1:50 PM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
> Ok. That looks good.

Here's an updated version.  In this version, based on a voice
discussion with Ashutosh and Jeevan, I adjusted 0001 to combine it
with an earlier idea of splitting Gather/Gather Merge path generation
out of the function that creates partially aggregated paths.  The idea
here is that create_ordinary_gather_paths() could first call
create_partial_grouping_paths(), then add additional paths which might
be partial or non-partial by invoking the partition-wise aggregate
logic, then call gather_grouping_paths() and set_cheapest() to
finalize the partially grouped rel.  Also, I added draft commit
messages.

With this patch set applied, the key bit of logic in
create_ordinary_grouping_paths() ends up looking like this:

    if (grouped_rel->consider_parallel && input_rel->partial_pathlist != NIL
        && (flags & GROUPING_CAN_PARTIAL_AGG) != 0)
    {
        partially_grouped_rel =
            create_partial_grouping_paths(root,
                                          grouped_rel,
                                          input_rel,
                                          gd,
                                          can_sort,
                                          can_hash,
                                          &agg_final_costs);
        gather_grouping_paths(root, partially_grouped_rel);
        set_cheapest(partially_grouped_rel);
    }

I imagine that what the main partition-wise aggregate patch would do
is (1) change the conditions under which
create_partial_grouping_paths() gets called, (2) postpone
gather_grouping_paths() and set_cheapest() until after partition-wise
aggregate had been done, doing them only if partially_grouped_rel !=
NULL.  Partition-wise aggregate will need to happen before
add_paths_to_grouping_rel(), though, so that the latter function can
try a FinalizeAggregate node on top of an Append added by
partition-wise aggregate.

This is a bit strange, because it will mean that partition-wise
aggregate will be attempted BEFORE adding ordinary aggregate paths to
grouped_rel but AFTER adding them to partially_grouped_rel.  We could
fix that by splitting add_paths_to_grouping_rel() into two functions,
one of which performs full aggregation directly and the other of which
tries finishing partial aggregation.  I'm unsure that's a good idea
though: it would mean that we have very similar logic in two different
functions that could get out of sync as a result of future code
changes, and it's not really fixing any problem.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

pgsql-hackers by date:

Previous
From: John Naylor
Date:
Subject: Re: MCV lists for highly skewed distributions
Next
From: Robert Haas
Date:
Subject: Re: [HACKERS] Partition-wise aggregation/grouping