Re: Parallel Aggregate - Mailing list pgsql-hackers
From | David Rowley |
---|---|
Subject | Re: Parallel Aggregate |
Date | |
Msg-id | CAKJS1f-q3ak=fP5SB-t5df8LCydPYWEdW7MQtEjYWn=X=Ju8mg@mail.gmail.com Whole thread Raw |
In response to | Re: Parallel Aggregate (Alvaro Herrera <alvherre@2ndquadrant.com>) |
Responses |
Re: Parallel Aggregate
|
List | pgsql-hackers |
On 19 March 2016 at 09:53, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > I read this a bit, as an exercise to try to follow parallel query a bit. Thanks for taking a look at this. > I think the most interesting thing I have to say is that the new error > messages in ExecInitExpr do not conform to our style. Probably just > downcase everything except Aggref and you're done, since they're > can't-happen conditions anyway. The comments below are mostly as I > learn how the whole thing works so if I'm talking nonsense, I'm happy to > be ignored or, better yet, educated. Per a comment above from Robert I ended up just removing most of that code due to the removal of Aggref->aggpartial. It did not seem worth keeping aggpartial around just for these errors either, but let me know if you think otherwise. > I think the way we set the "consider_parallel" flag is a bit odd (we > just "return" out of the function in the cases were it mustn't be set); > but that mechanism is already part of set_rel_consider_parallel and > similar to (but not quite like) longstanding routines such as > set_rel_width, so nothing new in this patch. I find this a bit funny > coding, but then this is the planner so maybe it's okay. hmm, I think its very similar to set_rel_consider_parallel(), as that just does return for non-supported cases, and if it makes it until the end of the function consider_parallel is set to true. Would you rather see; /* we can do nothing in parallel if there's no aggregates or group by */ if (!parse->hasAggs && parse->groupClause == NIL) grouped_rel->consider_parallel = false; /* grouping sets are currently not supported by parallel aggregate */ else if (parse->groupingSets) grouped_rel->consider_parallel = false; ...? > I think the comment on search_indexed_tlist_for_partial_aggref is a bit > bogus; it says it returns an existing Var, but what it does is > manufacture one itself. I *think* the code is all right, but the > comment seems misleading. Ok, I've changed it to: search_indexed_tlist_for_partial_aggref - find an Aggref in an indexed tlist which seems to be equivalent to search_indexed_tlist_for_non_var()'s comment. > In set_combineagg_references(), there are two calls to > fix_combine_agg_expr(); I think the one hanging on the > search_indexed_tlist_for_sortgroupref call is useless; you could just > have the "if newexpr != NULL" in the outer block (after initializing to > NULL before the ressortgroupref check). Yes, but see set_upper_references(), it just follows the pattern there but calls fix_combine_agg_expr() instead of fix_upper_expr(). For simplicity of review it seems to be nice to keep it following the same pattern. > set_combineagg_references's comment says it does something "similar to > set_upper_references, and additionally" some more stuff, but reading the > code for both functions I think that's not quite true. I think the > comment should say that both functions are parallel, but one works for > partial aggs and the other doesn't. Ok, I've changed the comment to: /* * set_combineagg_references * This does a similar job as set_upper_references(), but treats Aggrefs * in a different way. Here we transforms Aggref nodes args to suit the * combine aggregate phase. This means that the Aggref->args are converted * to reference the corresponding aggregate function in the subplan rather * than simple Var(s), as would be the case for a non-combine aggregate * node. */ > Actually, what happens if you feed > an agg plan with combineStates to set_upper_references? If it still > works but the result is not optimal, maybe we should check against that > case, so as to avoid the case where somebody hacks this further and the > plans are suddenly not agg-combined anymore. What I actually expect to > happen is that something would explode during execution; in that case > perhaps it's better to add a comment? (In further looking at other > setrefs.c similar functions, maybe it's fine the way you have it.) This simply won't work as this is the code which causes the sum((sum(num))) in the combine aggregate's target list. The normal code is trying to look for Vars, but this code is trying to find that sum(num) inside the subplan target list. Example EXPLAIN VERBOSE output: Finalize Aggregate (cost=105780.67..105780.68 rows=1 width=8) Output: pg_catalog.sum((sum(num))) Try commenting out; if (aggplan->combineStates) set_combineagg_references(root, plan, rtoffset); else to see the horrors than ensue. It'll basically turn aggregate functions in to a rather inefficient random number generator. This is because the combine Aggref->args still point to "num", instead of sum(num). > Back at make_partialgroup_input_target, the comment says "so that they > can be found later in Combine Aggregate nodes during setrefs". I think > it's better to be explicit and say ".. can be found later during > set_combineagg_references" or something. Changed. Thanks for the review. Updated patch is attached. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
pgsql-hackers by date: