Re: Parallel Aggregate - Mailing list pgsql-hackers
From | David Rowley |
---|---|
Subject | Re: Parallel Aggregate |
Date | |
Msg-id | CAKJS1f_vRdjpL-CDHvkPWtXE9GN5v8TcBxNpUAejRJ872RgWAA@mail.gmail.com Whole thread Raw |
In response to | Re: Parallel Aggregate (Haribabu Kommi <kommi.haribabu@gmail.com>) |
Responses |
Re: Parallel Aggregate
Re: Parallel Aggregate |
List | pgsql-hackers |
On 17 February 2016 at 17:50, Haribabu Kommi <kommi.haribabu@gmail.com> wrote: > Here I attached a draft patch based on previous discussions. It still needs > better comments and optimization. Over in [1] Tom posted a large change to the grouping planner which causes large conflict with the parallel aggregation patch. I've been looking over Tom's patch and reading the related thread and I've observed 3 things: 1. Parallel Aggregate will be much easier to write and less code to base it up top of Tom's upper planner changes. The latest patch does add a bit of cruft (e.g create_gather_plan_from_subplan()) which won't be required after Tom pushes the changes to the upper planner. 2. If we apply parallel aggregate before Tom's upper planner changes go in, then Tom needs to reinvent it again when rebasing his patch. This seems senseless, so this is why I did this work. 3. Based on the thread, most people are leaning towards getting Tom's changes in early to allow a bit more settle time before beta, and perhaps also to allow other patches to go in after (e.g this) So, I've done a bit of work and I've rewritten the parallel aggregate code to base it on top of Tom's patch posted in [1]. There's a few things that are left unsolved at this stage. 1. exprType() for Aggref still returns the aggtype, where it needs to return the trans type for partial agg nodes, this need to return the trans type rather than the aggtype. I had thought I might fix this by adding a proxy node type that sits in the targetlist until setrefs.c where it can be plucked out and replaced by the Aggref. I need to investigate this further. 2. There's an outstanding bug relating to HAVING clause not seeing the right state of aggregation and returning wrong results. I've not had much time to look into this yet, but I suspect its an existing bug that's already in master from my combine aggregate patch. I will investigate this on Sunday. In regards to the patch, there's a few things worth mentioning here: 1. I've had to add a parallel_degree parameter to create_group_path() and create_agg_path(). I think Tom is going to make changes to his patch so that the Path's parallel_degree is propagated to subnodes, this should allow me to remove this parameter and just use parallel_degree the one from the subpath. 2. I had to add a new parameter to pass an optional row estimate to cost_gather() as I don't have a RelOptInfo available to get a row estimate from which represents the state after partial aggregation. I thought this change was ok, but I'll listen to anyone who thinks of a better way to do it. 3. The code never attempts to mix and match Grouping Agg and Hash Agg plans. e.g it could be an idea to perform Partial Hash Aggregate -> Gather -> Sort -> Finalize Group Aggregate, or hash as in the Finalize stage. I just thought doing this is more complex than what's really needed, but if someone can think of a case where this would be a great win then I'll listen, but you have to remember we don't have any pre-sorted partial paths at this stage, so an explicit sort is required *always*. This might change if someone invented partial btree index scans... but until then... Due to the existence of the outstanding issues above, I feel like I might be posting the patch a little earlier, but wanted to do so since this is quite a hot area in the code at the moment and I wanted to post for transparency. To apply the patch please apply [1] first. [1] http://www.postgresql.org/message-id/3795.1456689808@sss.pgh.pa.us -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
pgsql-hackers by date: