Re: [HACKERS] Partition-wise aggregation/grouping - Mailing list pgsql-hackers

From Robert Haas
Subject Re: [HACKERS] Partition-wise aggregation/grouping
Date
Msg-id CA+TgmoaMoCsyon12LJMVSj1knMevOZQvxowO5C+FY+ZSOaQULg@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Partition-wise aggregation/grouping  (Jeevan Chalke <jeevan.chalke@enterprisedb.com>)
Responses Re: [HACKERS] Partition-wise aggregation/grouping  (Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>)
Re: [HACKERS] Partition-wise aggregation/grouping  (Jeevan Chalke <jeevan.chalke@enterprisedb.com>)
List pgsql-hackers
On Wed, Mar 21, 2018 at 11:33 AM, Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:
> Let me try to explain this:
> 1. GROUPING_CAN_PARTITIONWISE_AGG
> 2. extra->is_partial_aggregation
> 3. extra->perform_partial_partitionwise_aggregation

Please find attached an incremental patch that attempts to refactor
this logic into a simpler form.  What I've done is merged all three of
the above Booleans into a single state variable called 'patype', which
can be one of PARTITIONWISE_AGGREGATE_NONE,
PARTITIONWISE_AGGREGATE_FULL, and PARTITIONWISE_AGGREGATE_PARTIAL.
When create_ordinary_grouping_paths() is called, extra.patype is the
value for the parent relation; that function computes a new value and
passes it down to create_partitionwise_grouping_paths(), which inserts
into the new 'extra' structure for the child.

Essentially, in your system, extra->is_partial_aggregation and
extra->perform_partial_partitionwise_aggregation both corresponded to
whether or not patype == PARTITIONWISE_AGGREGATE_PARTIAL, but the
former indicated whether the *parent* was doing partition-wise
aggregation (and thus we needed to generate only partial paths) while
the latter indicated whether the *current* relation was doing
partition-wise aggregation (and thus we needed to force creation of
partially_grouped_rel).  This took me a long time to understand
because of the way the fields were named; they didn't indicate that
one was for the parent and one for the current relation.  Meanwhile,
GROUPING_CAN_PARTITIONWISE_AGG indicated whether partition-wise
aggregate should be tried at all for the current relation; there was
no analogous indicator for the parent relation because we can't be
processing a child at all if the parent didn't decide to do
partition-wise aggregation.  So to know what was happening for the
current relation you had to look at GROUPING_CAN_PARTITIONWISE_AGG +
extra->perform_partial_partitionwise_aggregation, and to know what was
happening for the parent relation you just looked at
extra->is_partial_aggregation.  With this proposed refactoring patch,
there's just one patype value at each level, which at least to me
seems simpler.  I tried to improve the comments somewhat, too.

You have some long lines that it would be good to break, like this:

         child_extra.targetList = (List *) adjust_appendrel_attrs(root,

(Node *) extra->targetList,
                                                                  nappinfos,
                                                                  appinfos);

If you put a newline after (List *), the formatting will come out
nicer -- it will fit within 80 columns.  Please go through the patches
and make these kinds of changes for lines over 80 columns where
possible.

I guess we'd better move the GROUPING_CAN_* constants to a header
file, if they're going to be exposed through GroupPathExtraData.  That
can go in some refactoring patch.

Is there a good reason not to use input_rel->relids as the input to
fetch_upper_rel() in all cases, rather than just at subordinate
levels?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: JIT compiling with LLVM v12.2
Next
From: Andres Freund
Date:
Subject: Re: JIT compiling with LLVM v12.2