Re: Parallel Aggregate - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Parallel Aggregate |
Date | |
Msg-id | CA+TgmoZf_fKVhn7UD-gHcsdj7pEK1td4qQ3uexs=M-L=y72-Hg@mail.gmail.com Whole thread Raw |
In response to | Re: Parallel Aggregate (David Rowley <david.rowley@2ndquadrant.com>) |
Responses |
Re: Parallel Aggregate
|
List | pgsql-hackers |
On Mon, Mar 14, 2016 at 7:56 PM, David Rowley <david.rowley@2ndquadrant.com> wrote: >> More generally, why are we inventing PartialAggref >> instead of reusing Aggref? The code comments seem to contain no >> indication as to why we shouldn't need all the same details for >> PartialAggref that we do for Aggref, instead of only a small subset of >> them. Hmm... actually, it looks like PartialAggref is intended to >> wrap Aggref, but that seems like a weird design. Why not make Aggref >> itself DTRT? There's not enough explanation in the patch of what is >> going on here and why. > > A comment does explain this, but perhaps it's not good enough, so I've > rewritten it to become. > > /* > * PartialAggref > * > * When partial aggregation is required in a plan, the nodes from the partial > * aggregate node, up until the finalize aggregate node must pass the partially > * aggregated states up the plan tree. In regards to target list construction > * in setrefs.c, this requires that exprType() returns the state's type rather > * than the final aggregate value's type, and since exprType() for Aggref is > * coded to return the aggtype, this is not correct for us. We can't fix this > * by going around modifying the Aggref to change it's return type as setrefs.c > * requires searching for that Aggref using equals() which compares all fields > * in Aggref, and changing the aggtype would cause such a comparison to fail. > * To get around this problem we wrap the Aggref up in a PartialAggref, this > * allows exprType() to return the correct type and we can handle a > * PartialAggref in setrefs.c by just peeking inside the PartialAggref to check > * the underlying Aggref. The PartialAggref lives as long as executor start-up, > * where it's removed and replaced with it's underlying Aggref. > */ > typedef struct PartialAggref > > does that help explain it better? I still think that's solving the problem the wrong way. Why can't exprType(), when applied to the Aggref, do something like this? { Aggref *aref = (Aggref *) expr; if (aref->aggpartial) return aref->aggtranstype; else return aref->aggtype; } The obvious answer is "well, because those fields don't exist in Aggref". But shouldn't they? From here, it looks like PartialAggref is a cheap hack around not having whacked Aggref around hard for partial aggregation. >> I don't see where this applies has_parallel_hazard or anything >> comparable to the aggregate functions. I think it needs to do that. > > Not sure what you mean here. If the aggregate isn't parallel-safe, you can't do this optimization. For example, imagine an aggregate function written in PL/pgsql that for some reason writes data to a side table. It's has_parallel_hazard's job to check the parallel-safety properties of the functions used in the query. >> + /* XXX +1 ? do we expect the main process to actually do real work? */ >> + numPartialGroups = Min(numGroups, subpath->rows) * >> + (subpath->parallel_degree + 1); >> I'd leave out the + 1, but I don't think it matters much. > > Actually I meant to ask you about this. I see that subpath->rows is > divided by the Path's parallel_degree, but it seems the main process > does some work too, so this is why I added + 1, as during my tests > using a query which produces 10 groups, and had 4 workers, I noticed > that Gather was getting 50 groups back, rather than 40, I assumed this > is because the main process is helping too, but from my reading of the > parallel query threads I believe this is because the Gather, instead > of sitting around idle tries to do a bit of work too, if it appears > that nothing else is happening quickly enough. I should probably go > read nodeGather.c to learn that though. Yes, the main process does do some work, but less and less as the query gets more complicated. See comments in cost_seqscan(). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: