Re: Combining Aggregates - Mailing list pgsql-hackers

From David Rowley
Subject Re: Combining Aggregates
Date
Msg-id CAApHDvpFQ-TYta2sTzcGCpfZ9B5b9h75cc99usVekQo=qN_LaA@mail.gmail.com
Whole thread Raw
In response to Combining Aggregates  (Simon Riggs <simon@2ndQuadrant.com>)
Responses Re: Combining Aggregates  (Simon Riggs <simon@2ndQuadrant.com>)
List pgsql-hackers
On 17 December 2014 at 22:53, Simon Riggs <simon@2ndquadrant.com> wrote:
KaiGai, David Rowley and myself have all made mention of various ways
we could optimize aggregates.

Following WIP patch adds an extra function called a "combining
function", that is intended to allow the user to specify a
semantically correct way of breaking down an aggregate into multiple
steps.

Gents, is this what you were thinking? If not...


Very much so! You must have missed my patch.


The cases I think that may benefit from such infrastructure would be:

1. Parallel queries, where each worker could work on a portion of the tuples being aggregated and then the combine/merge function is called in the end in order to produce the final aggregated result. We currently don't yet have parallel query, so we can't really commit anything yet, for that reason.

2. Queries such as:

SELECT p.name, SUM(s.qty) FROM sales s INNER JOIN product p ON s.product_id = s.product_id GROUP BY p.name;

Such a query could be transformed into:

SELECT p.name,SUM(qty) FROM (SELECT product_id,SUM(qty) AS qty FROM sales GROUP BY product_id) s
INNER JOIN product p ON p.product_id = s.product_id GROUP BY p_name;

Of course the outer query's SUM and GROUP BY would not be required if there happened to be a UNIQUE index on product(name), but assuming there's not then the above should produce the results faster. This of course works ok for SUM(), but for something like AVG() or STDDEV() the combine/merge aggregate functions would be required to process those intermediate aggregate results that were produced by the sub-query.

Regards

David Rowley 
 

pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: Combining Aggregates
Next
From: Magnus Hagander
Date:
Subject: Re: analyze_new_cluster.bat and delete_old_cluster.bat not ignored with vcregress upgradecheck