Re: using custom scan nodes to prototype parallel sequential scan - Mailing list pgsql-hackers

From Atri Sharma
Subject Re: using custom scan nodes to prototype parallel sequential scan
Date
Msg-id CAOeZVicxW7nzw=kRcSAGxLQ51aBemnU_hJDjBm-yx39g-=XKSg@mail.gmail.com
Whole thread Raw
In response to Re: using custom scan nodes to prototype parallel sequential scan  (David Rowley <dgrowleyml@gmail.com>)
List pgsql-hackers


On Wed, Nov 12, 2014 at 1:24 PM, David Rowley <dgrowleyml@gmail.com> wrote:

On Tue, Nov 11, 2014 at 9:29 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

This plan type is widely used in reporting queries, so will hit the
mainline of BI applications and many Mat View creations.
This will allow SELECT count(*) FROM foo to go faster also.


We'd also need to add some infrastructure to merge aggregate states together for this to work properly. This means that could also work for avg() and stddev etc. For max() and min() the merge functions would likely just be the same as the transition functions. 


It might make sense to make a new planner operator which can be responsible for pulling from each of the individual parallel Agg nodes and then aggregating over the results.


A couple of things that might be worth considering are whether we want to enforce using parallel aggregation or let planner decide if it wants to do a parallel aggregate or go with a single plan. For eg, the average estimated size of groups might be one thing that planner may consider while deciding between a parallel and a single execution plan.

I dont see merging states as an easy problem, and should perhaps be tackled apart from this thread.

Also, do we want to allow parallelism only with GroupAggs?

Regards,

Atri

pgsql-hackers by date:

Previous
From: Jeff Davis
Date:
Subject: Re: group locking: incomplete patch, just for discussion
Next
From: Antonin Houska
Date:
Subject: Unintended restart after recovery error