Re: Parallel Aggregate - Mailing list pgsql-hackers
From | David Rowley |
---|---|
Subject | Re: Parallel Aggregate |
Date | |
Msg-id | CAKJS1f8oTJ7_dFVFXc646uHE87TbBt5+bCm+c9J6t=r3EZRjsg@mail.gmail.com Whole thread Raw |
In response to | Re: Parallel Aggregate (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: Parallel Aggregate
|
List | pgsql-hackers |
On 16 March 2016 at 09:23, Robert Haas <robertmhaas@gmail.com> wrote: > On Mon, Mar 14, 2016 at 7:56 PM, David Rowley > <david.rowley@2ndquadrant.com> wrote: >> A comment does explain this, but perhaps it's not good enough, so I've >> rewritten it to become. >> >> /* >> * PartialAggref >> * >> * When partial aggregation is required in a plan, the nodes from the partial >> * aggregate node, up until the finalize aggregate node must pass the partially >> * aggregated states up the plan tree. In regards to target list construction >> * in setrefs.c, this requires that exprType() returns the state's type rather >> * than the final aggregate value's type, and since exprType() for Aggref is >> * coded to return the aggtype, this is not correct for us. We can't fix this >> * by going around modifying the Aggref to change it's return type as setrefs.c >> * requires searching for that Aggref using equals() which compares all fields >> * in Aggref, and changing the aggtype would cause such a comparison to fail. >> * To get around this problem we wrap the Aggref up in a PartialAggref, this >> * allows exprType() to return the correct type and we can handle a >> * PartialAggref in setrefs.c by just peeking inside the PartialAggref to check >> * the underlying Aggref. The PartialAggref lives as long as executor start-up, >> * where it's removed and replaced with it's underlying Aggref. >> */ >> typedef struct PartialAggref >> >> does that help explain it better? > > I still think that's solving the problem the wrong way. Why can't > exprType(), when applied to the Aggref, do something like this? > > { > Aggref *aref = (Aggref *) expr; > if (aref->aggpartial) > return aref->aggtranstype; > else > return aref->aggtype; > } > > The obvious answer is "well, because those fields don't exist in > Aggref". But shouldn't they? From here, it looks like PartialAggref > is a cheap hack around not having whacked Aggref around hard for > partial aggregation. We could do it that way if we left the aggpartial field out of the equals() check, but I think we go at length to not do that. Just look at what's done for all of the location fields. In any case if we did that then it might not actually be what we want all of the time... Perhaps in some cases we'd want equals() to return false when the aggpartial does not match, and in other cases we'd want it to return true. There's no way to control that behaviour, so to get around the setrefs.c problem I created the wrapper node type, which I happen to think is quite clean. Just see Tom's comments about Haribabu's temp fix for the problem where he put some hacks into the equals for aggref in [1]. >>> I don't see where this applies has_parallel_hazard or anything >>> comparable to the aggregate functions. I think it needs to do that. >> >> Not sure what you mean here. > > If the aggregate isn't parallel-safe, you can't do this optimization. > For example, imagine an aggregate function written in PL/pgsql that > for some reason writes data to a side table. It's > has_parallel_hazard's job to check the parallel-safety properties of > the functions used in the query. Sorry, I do know what you mean by that. I might have been wrong to assume that the parallelModeOK check did this. I will dig into how that is set exactly. >>> + /* XXX +1 ? do we expect the main process to actually do real work? */ >>> + numPartialGroups = Min(numGroups, subpath->rows) * >>> + (subpath->parallel_degree + 1); >>> I'd leave out the + 1, but I don't think it matters much. >> >> Actually I meant to ask you about this. I see that subpath->rows is >> divided by the Path's parallel_degree, but it seems the main process >> does some work too, so this is why I added + 1, as during my tests >> using a query which produces 10 groups, and had 4 workers, I noticed >> that Gather was getting 50 groups back, rather than 40, I assumed this >> is because the main process is helping too, but from my reading of the >> parallel query threads I believe this is because the Gather, instead >> of sitting around idle tries to do a bit of work too, if it appears >> that nothing else is happening quickly enough. I should probably go >> read nodeGather.c to learn that though. > > Yes, the main process does do some work, but less and less as the > query gets more complicated. See comments in cost_seqscan(). Thanks [1] http://www.postgresql.org/message-id/10158.1457329140@sss.pgh.pa.us -- David Rowley http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
pgsql-hackers by date: