Re: [HACKERS] Partition-wise aggregation/grouping - Mailing list pgsql-hackers

From Ashutosh Bapat
Subject Re: [HACKERS] Partition-wise aggregation/grouping
Date
Msg-id CAFjFpRfHhKwVHGkPyVXysBu2OgPng6DXQvU+qwZdu30AO+8d4A@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Partition-wise aggregation/grouping  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Fri, Mar 16, 2018 at 12:16 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> - partial_costs_set.  The comment in compute_group_path_extra_data
> doesn't look good.  It says "Set partial aggregation costs if we are
> going to calculate partial aggregates in make_grouping_rels()", but
> what it means is something more like "agg_partial_costs and
> agg_final_costs are not valid yet".  I also wonder if there's a way
> that we can figure out in advance whether we're going to need to do
> this and just do it at the appropriate place in the code, as opposed
> to doing it lazily.  Even if there were rare cases where we did it
> unnecessarily I'm not sure that would be any big deal.

I struggled with this one. In case of multi-level partitioning we will
compute fully aggregated results per partition in the levels for which
the partition keys are covered by GROUP BY clause. But beyond those
levels we will compute partial aggregates. When none of the levels can
use parallelism, only those levels which compute the partial
aggregates need these costs to be calculated. We will need to traverse
partitioning hierarchy before even starting aggregation to decide
whether we will need partial aggregation downwards. Instead of adding
that walker, I thought it better to use this flag. But more on this in
reply to your next mail.

>
> I wonder if we could simplify things by copying more information from
> the parent grouping rel to the child grouping rels.  It seems to me
> for example that the way you're computing consider_parallel for the
> child relations is kind of pointless.  The parallel-safety of the
> grouping_target can't vary across children, nor can that of the
> havingQual; the only thing that can change is whether the input rel
> has consider_parallel set.  You could cater to that by having
> GroupPathExtraData do something like extra.grouping_is_parallel_safe =
> target_parallel_safe && is_parallel_safe(root, havingQual) and then
> set each child's consider parallel flag to
> input_rel->consider_parallel && extra.grouping_is_parallel_safe.

I am actually confused by the current code itself. What parallel
workers compute is partial target which is different from the full
target. The partial target would only contain expressions in the
GROUPBY clause and partial aggregate nodes. It will not contain any
expressions comprising of full aggregates. When partial aggregate
nodes are parallel safe but the expressions using the full aggregates
are not parallel safe, the current code will not allow parallel
aggregation to take place whereas it should. That looks like an
optimization we are missing today.

That bug aside, the point is the target of grouped relation and that
of the partially grouped relation are different, giving rise to the
possibility that one is parallel safe and other is not. In fact, for
partially grouped relation, we shouldn't check parallel safety of
havingQual since havingQual is only applicable over the fully
aggregated result. That seems to be another missing optimization. OR I
am missing something here.

In case of partition-wise aggregates some levels may be performing
only partial aggregation and if partial aggregation is parallel safe,
we should allow those levels to run parallel partial aggregation.
Checking parallel safety of only grouped relation doesn't help here.
But since the optimization is already missing right now, I think this
patch shouldn't bother about it.

>
> Similarly, right now the way the patch sets the reltargets for
> grouping rels and partially grouping rels is a bit complex.
> make_grouping_rels() calls make_partial_grouping_target() separately
> for each partial grouping rel, but for non-partial grouping rels it
> gets the translated tlist as an argument.  Could we instead consider
> always building the tlist by translation from the parent, that is, a
> child grouped rel's tlist is the translation of the parent
> grouped_rel's tlist, and the child partially grouped rel's tlist is
> the translation of the parent partially_grouped_rel's tlist?  If you
> could both make that work and find a different place to compute the
> partial agg costs, make_grouping_rels() would get a lot simpler or
> perhaps go away entirely.

Hmm that's a thought. While we are translating, we allocate new nodes,
whereas make_partial_grouping_target() uses same nodes from the full
target. For a partially grouped child relation, this means that we
will allocate nodes to create partial target as well and then setrefs
will spend cycles matching node trees instead of matching pointers.
But I think we can take that hit if it saves us some complexity in the
code.

>
> I don't like this condition which appears in that function:
>
>     if (extra->try_parallel_aggregation || force_partial_agg ||
>         (extra->partitionwise_grouping &&
>          extra->partial_partitionwise_grouping))
>
> The problem with that is that it's got to exactly match the criteria
> for whether we're going to need the partial_grouping_rel.  If it's
> true when we are not using partial paths, then you've missed an
> optimization;  in the reverse case, we'll probably crash or fail to
> consider paths we should have considered.

>  It is not entirely
> straightforward to verify that this test is correct.
> add_paths_to_partial_grouping_rel() gets called if
> extra->try_parallel_aggregation is true or if
> extra->is_partial_aggregation is true, but the condition doesn't test
> extra->is_partial_aggregation at all.

Why do we need to test extra->is_partial_aggregation? We are testing
force_partial_agg. I agree that we should probably test
is_partial_aggregation, but that doesn't make this condition wrong.

> The other way that we can end up
> using partially_grouped_rel is if create_partitionwise_grouping_paths
> is called, but it just silently fails to do anything if we have no
> partially_grouped_rel.

It will create partially_grouped_rel when partition-wise grouping
requires it. So, this sentence seems to contradict itself. I am
confused.

> Moreover, even if it's
> correct now, I think that the chances that the next person who
> modifies this code will manage to keep it correct are not great.  I
> think we need to create the partial grouping rel somewhere in the code
> that's closer to where it's actually needed, so that we don't have so
> much action at a distance, or at least have a simpler and more
> transparent set of tests.

+1. I agree with that.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


pgsql-hackers by date:

Previous
From: Ashutosh Bapat
Date:
Subject: Re: [HACKERS] Partition-wise aggregation/grouping
Next
From: Jason Petersen
Date:
Subject: Re: PostgreSQL opens all the indexes of a relation for every Queryduring Planning Time?