Re: Partial aggregates pushdown - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Partial aggregates pushdown
Date
Msg-id b5a2bb24-ba7d-4053-b298-7e6247fd4043@enterprisedb.com
Whole thread Raw
In response to Partial aggregates pushdown  (Alexander Pyhalov <a.pyhalov@postgrespro.ru>)
Responses Re: Partial aggregates pushdown
Re: Partial aggregates pushdown
List pgsql-hackers
Hi Alexander,

On 10/15/21 15:15, Alexander Pyhalov wrote:
> Hi.
> 
> One of the issues when we try to use sharding in PostgreSQL is absence 
> of partial aggregates pushdown.
> 
> I see several opportunities to alleviate this issue.
> If we look at Citus, it implements aggregate, calculating internal state 
> of an arbitrary agregate function and exporting it as text. So we could 
> calculate internal states independently on all data sources and then 
> finalize it, which allows to compute arbitrary aggregate.
> 
> But, as mentioned in [1] thread, for some functions (like 
> count/max/min/sum) we can just push down them. It seems easy and covers 
> a lot of cases.
> For now there are still issues - for example you can't handle functions 
> as avg() as we should somehow get its internal state or sum() variants, 
> which need aggserialfn/aggdeserialfn. Preliminary version is attached.
> 
> Is someone else working on the issue? Does suggested approach make sense?
> 

I think a couple people worked on this (or something similar/related) in 
the past, but I don't recall any recent patches.

IMHO being able to push-down parts of an aggregation to other nodes is a 
very desirable feature, that might result in huge improvements for some 
analytical workloads.

As for the proposed approach, it's probably good enough for the first 
version to restrict this to aggregates where the aggregate result is 
sufficient, i.e. we don't need any new export/import procedures.

But it's very unlikely we'd want to restrict it the way the patch does 
it, i.e. based on aggregate name. That's both fragile (people can create 
new aggregates with such name) and against the PostgreSQL extensibility 
(people may implement custom aggregates, but won't be able to benefit 
from this just because of name).

So for v0 maybe, but I think there neeeds to be a way to relax this in 
some way, for example we could add a new flag to pg_aggregate to mark 
aggregates supporting this.

And then we should extend this for aggregates with more complex internal 
states (e.g. avg), by supporting a function that "exports" the aggregate 
state - similar to serial/deserial functions, but needs to be portable.

I think the trickiest thing here is rewriting the remote query to call 
this export function, but maybe we could simply instruct the remote node 
to use a different final function for the top-level node?


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: BUG #17212: pg_amcheck fails on checking temporary relations
Next
From: Alexander Pyhalov
Date:
Subject: Re: Partial aggregates pushdown