Re: Parallel Aggregate - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Parallel Aggregate
Date
Msg-id CA+TgmoZf_fKVhn7UD-gHcsdj7pEK1td4qQ3uexs=M-L=y72-Hg@mail.gmail.com
Whole thread Raw
In response to Re: Parallel Aggregate  (David Rowley <david.rowley@2ndquadrant.com>)
Responses Re: Parallel Aggregate
List pgsql-hackers
On Mon, Mar 14, 2016 at 7:56 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
>> More generally, why are we inventing PartialAggref
>> instead of reusing Aggref?  The code comments seem to contain no
>> indication as to why we shouldn't need all the same details for
>> PartialAggref that we do for Aggref, instead of only a small subset of
>> them.  Hmm... actually, it looks like PartialAggref is intended to
>> wrap Aggref, but that seems like a weird design.  Why not make Aggref
>> itself DTRT?   There's not enough explanation in the patch of what is
>> going on here and why.
>
> A comment does explain this, but perhaps it's not good enough, so I've
> rewritten it to become.
>
> /*
>  * PartialAggref
>  *
>  * When partial aggregation is required in a plan, the nodes from the partial
>  * aggregate node, up until the finalize aggregate node must pass the partially
>  * aggregated states up the plan tree. In regards to target list construction
>  * in setrefs.c, this requires that exprType() returns the state's type rather
>  * than the final aggregate value's type, and since exprType() for Aggref is
>  * coded to return the aggtype, this is not correct for us. We can't fix this
>  * by going around modifying the Aggref to change it's return type as setrefs.c
>  * requires searching for that Aggref using equals() which compares all fields
>  * in Aggref, and changing the aggtype would cause such a comparison to fail.
>  * To get around this problem we wrap the Aggref up in a PartialAggref, this
>  * allows exprType() to return the correct type and we can handle a
>  * PartialAggref in setrefs.c by just peeking inside the PartialAggref to check
>  * the underlying Aggref. The PartialAggref lives as long as executor start-up,
>  * where it's removed and replaced with it's underlying Aggref.
>  */
> typedef struct PartialAggref
>
> does that help explain it better?

I still think that's solving the problem the wrong way.  Why can't
exprType(), when applied to the Aggref, do something like this?

{   Aggref *aref = (Aggref *) expr;   if (aref->aggpartial)       return aref->aggtranstype;   else       return
aref->aggtype;
}

The obvious answer is "well, because those fields don't exist in
Aggref".  But shouldn't they?  From here, it looks like PartialAggref
is a cheap hack around not having whacked Aggref around hard for
partial aggregation.

>> I don't see where this applies has_parallel_hazard or anything
>> comparable to the aggregate functions.  I think it needs to do that.
>
> Not sure what you mean here.

If the aggregate isn't parallel-safe, you can't do this optimization.
For example, imagine an aggregate function written in PL/pgsql that
for some reason writes data to a side table.  It's
has_parallel_hazard's job to check the parallel-safety properties of
the functions used in the query.

>> +       /* XXX +1 ? do we expect the main process to actually do real work? */
>> +       numPartialGroups = Min(numGroups, subpath->rows) *
>> +                                               (subpath->parallel_degree + 1);
>> I'd leave out the + 1, but I don't think it matters much.
>
> Actually I meant to ask you about this. I see that subpath->rows is
> divided by the Path's parallel_degree, but it seems the main process
> does some work too, so this is why I added + 1, as during my tests
> using a query which produces 10 groups, and had 4 workers, I noticed
> that Gather was getting 50 groups back, rather than 40, I assumed this
> is because the main process is helping too, but from my reading of the
> parallel query threads I believe this is because the Gather, instead
> of sitting around idle tries to do a bit of work too, if it appears
> that nothing else is happening quickly enough. I should probably go
> read nodeGather.c to learn that though.

Yes, the main process does do some work, but less and less as the
query gets more complicated.  See comments in cost_seqscan().

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Aleksander Alekseev
Date:
Subject: Re: Small patch: fix warnings during compilation on FreeBSD
Next
From: Robert Haas
Date:
Subject: Re: RFC: replace pg_stat_activity.waiting with something more descriptive