Re: Parallel Aggregate - Mailing list pgsql-hackers

From David Rowley
Subject Re: Parallel Aggregate
Date
Msg-id CAKJS1f8oTJ7_dFVFXc646uHE87TbBt5+bCm+c9J6t=r3EZRjsg@mail.gmail.com
Whole thread Raw
In response to Re: Parallel Aggregate  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Parallel Aggregate
List pgsql-hackers
On 16 March 2016 at 09:23, Robert Haas <robertmhaas@gmail.com> wrote:
> On Mon, Mar 14, 2016 at 7:56 PM, David Rowley
> <david.rowley@2ndquadrant.com> wrote:
>> A comment does explain this, but perhaps it's not good enough, so I've
>> rewritten it to become.
>>
>> /*
>>  * PartialAggref
>>  *
>>  * When partial aggregation is required in a plan, the nodes from the partial
>>  * aggregate node, up until the finalize aggregate node must pass the partially
>>  * aggregated states up the plan tree. In regards to target list construction
>>  * in setrefs.c, this requires that exprType() returns the state's type rather
>>  * than the final aggregate value's type, and since exprType() for Aggref is
>>  * coded to return the aggtype, this is not correct for us. We can't fix this
>>  * by going around modifying the Aggref to change it's return type as setrefs.c
>>  * requires searching for that Aggref using equals() which compares all fields
>>  * in Aggref, and changing the aggtype would cause such a comparison to fail.
>>  * To get around this problem we wrap the Aggref up in a PartialAggref, this
>>  * allows exprType() to return the correct type and we can handle a
>>  * PartialAggref in setrefs.c by just peeking inside the PartialAggref to check
>>  * the underlying Aggref. The PartialAggref lives as long as executor start-up,
>>  * where it's removed and replaced with it's underlying Aggref.
>>  */
>> typedef struct PartialAggref
>>
>> does that help explain it better?
>
> I still think that's solving the problem the wrong way.  Why can't
> exprType(), when applied to the Aggref, do something like this?
>
> {
>     Aggref *aref = (Aggref *) expr;
>     if (aref->aggpartial)
>         return aref->aggtranstype;
>     else
>         return aref->aggtype;
> }
>
> The obvious answer is "well, because those fields don't exist in
> Aggref".  But shouldn't they?  From here, it looks like PartialAggref
> is a cheap hack around not having whacked Aggref around hard for
> partial aggregation.

We could do it that way if we left the aggpartial field out of the
equals() check, but I think we go at length to not do that. Just look
at what's done for all of the location fields. In any case if we did
that then it might not actually be what we want all of the time...
Perhaps in some cases we'd want equals() to return false when the
aggpartial does not match, and in other cases we'd want it to return
true. There's no way to control that behaviour, so to get around the
setrefs.c problem I created the wrapper node type, which I happen to
think is quite clean. Just see Tom's comments about Haribabu's temp
fix for the problem where he put some hacks into the equals for aggref
in [1].

>>> I don't see where this applies has_parallel_hazard or anything
>>> comparable to the aggregate functions.  I think it needs to do that.
>>
>> Not sure what you mean here.
>
> If the aggregate isn't parallel-safe, you can't do this optimization.
> For example, imagine an aggregate function written in PL/pgsql that
> for some reason writes data to a side table.  It's
> has_parallel_hazard's job to check the parallel-safety properties of
> the functions used in the query.

Sorry, I do know what you mean by that. I might have been wrong to
assume that the parallelModeOK check did this. I will dig into how
that is set exactly.

>>> +       /* XXX +1 ? do we expect the main process to actually do real work? */
>>> +       numPartialGroups = Min(numGroups, subpath->rows) *
>>> +                                               (subpath->parallel_degree + 1);
>>> I'd leave out the + 1, but I don't think it matters much.
>>
>> Actually I meant to ask you about this. I see that subpath->rows is
>> divided by the Path's parallel_degree, but it seems the main process
>> does some work too, so this is why I added + 1, as during my tests
>> using a query which produces 10 groups, and had 4 workers, I noticed
>> that Gather was getting 50 groups back, rather than 40, I assumed this
>> is because the main process is helping too, but from my reading of the
>> parallel query threads I believe this is because the Gather, instead
>> of sitting around idle tries to do a bit of work too, if it appears
>> that nothing else is happening quickly enough. I should probably go
>> read nodeGather.c to learn that though.
>
> Yes, the main process does do some work, but less and less as the
> query gets more complicated.  See comments in cost_seqscan().

Thanks

[1] http://www.postgresql.org/message-id/10158.1457329140@sss.pgh.pa.us

-- David Rowley                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



pgsql-hackers by date:

Previous
From: Oskari Saarenmaa
Date:
Subject: Fix typos in comments
Next
From: Alvaro Herrera
Date:
Subject: Re: pgbench stats per script & other stuff