Re: Combining Aggregates - Mailing list pgsql-hackers

From David Rowley
Subject Re: Combining Aggregates
Date
Msg-id CAKJS1f9eH172rz0-3YXRZg+SsU+UkQB_uUkXHYdbaddaiYVcmw@mail.gmail.com
Whole thread Raw
In response to Re: Combining Aggregates  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Combining Aggregates  (Haribabu Kommi <kommi.haribabu@gmail.com>)
Re: Combining Aggregates  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On 21 January 2016 at 04:59, Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Jan 20, 2016 at 7:53 AM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
> On 21 January 2016 at 01:44, Robert Haas <robertmhaas@gmail.com> wrote:
>>
>> On Wed, Jan 20, 2016 at 7:38 AM, David Rowley
>> <david.rowley@2ndquadrant.com> wrote:
>> >> To my mind, priority #1 ought to be putting this fine new
>> >> functionality to some use.  Expanding it to every aggregate we've got
>> >> seems like a distinctly second priority.  That's not to say that it's
>> >> absolutely gotta go down that way, but those would be my priorities.
>> >
>> > Agreed. So I've attached a version of the patch which does not have any
>> > of
>> > the serialise/deserialise stuff in it.
>> >
>> > I've also attached a test patch which modifies the grouping planner to
>> > add a
>> > Partial Aggregate node, and a final aggregate node when it's possible.
>> > Running the regression tests with this patch only shows up variances in
>> > the
>> > EXPLAIN outputs, which is of course expected.
>>
>> That seems great as a test, but what's the first patch that can put
>> this to real and permanent use?
>
> There's no reason why parallel aggregates can't use the
> combine_aggregate_state_d6d480b_2016-01-21.patch patch.

I agree.  Are you going to work on that?  Are you expecting me to work
on that?  Do you think we can use Haribabu's patch?  What other
applications are in play in the near term, if any?

At the moment I think everything which will use this is queued up behind the pathification of the grouping planner which Tom is working on. I think naturally Parallel Aggregate makes sense to work on first, given all the other parallel stuff in this release. I plan on working on that that by either assisting Haribabu, or... whatever else it takes.

The other two usages which I have thought of are;

1) Aggregating before UNION ALL, which might be fairly simple after the grouping planner changes, as it may just be a matter of considering another "grouping path" which partially aggregates before the UNION ALL, and performs the final grouping stage after UNION ALL. At this stage it's hard to say how that will work as I'm not sure how far changes to the grouping planner will go. Perhaps Tom can comment?

2) Group before join. e.g select p.description,sum(s.qty) from sale s inner join s.product_id = p.product_id group by s.product_id group by p.description;  I have a partial patch which implements this, although I was a bit stuck on if I should invent the concept of "GroupingPaths", or just inject alternative subquery relations which are already grouped by the correct clause.  The problem with "GroupingPaths" was down to the row estimates currently come from the RelOptInfo and is set in set_baserel_size_estimates() which always assumes the ungrouped number of rows, which is not what's needed if the grouping is already performed. I was holding off to see how Tom does this in the grouping planner changes.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

pgsql-hackers by date:

Previous
From: Craig Ringer
Date:
Subject: Re: Re: pglogical_output - a general purpose logical decoding output plugin
Next
From: David Rowley
Date:
Subject: Re: Combining Aggregates