Re: Partial aggregates pushdown - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Partial aggregates pushdown
Date
Msg-id 982a903e-cb59-4b60-911a-bdbb88da594e@vondra.me
Whole thread Raw
In response to Re: Partial aggregates pushdown  (Jelte Fennema-Nio <postgres@jeltef.nl>)
Responses Re: Partial aggregates pushdown
Re: Partial aggregates pushdown
List pgsql-hackers

On 8/8/24 13:48, Jelte Fennema-Nio wrote:
> SUMMARY OF THREAD
> 
> The design of patch 0001 is agreed upon by everyone on the thread (so
> far). This adds the PARTIAL_AGGREGATE label for aggregates, which will
> cause the finalfunc not to run. It also starts using PARTIAL_AGGREGATE
> for pushdown of aggregates in postgres_fdw. In 0001 PARTIAL_AGGREGATE
> is only supported for aggregates with a non-internal/pseudo type as
> the stype.
> 

I don't have a strong opinion on this, but I wonder if someone might
object this essentially extends the syntax with something that is not
(and never will be) in the SQL standard. I wonder if there's some
precedent for encoding such explicit execution instructions into the
query itself?

That reminds me - the PARTIAL_AGGREGATE label is per aggregate, but I
don't think it makes much sense to do this only for some aggregates,
right? Do we have a way to make sure the query is "consistent"? I'm not
sure if doing this on the source (before pushdown) is enough. Could
there be a difference in what the remote instance supports?

The only alternative that I can think of (and that I believe was already
mentioned in this thread) is to set some GUC that forces the top-most
query level to do this (all aggregates at that level). That's have the
benefit of always affecting all aggregates.

> The design for patch 0002 is still under debate. This would expand on
> the functionality added by adding support for PARTIAL_AGGREGATE for
> aggregates with an internal stype. This is done by returning a byte
> array containing the bytes that the serialfunc of the aggregate
> returns.
> 
> A competing proposal for 0002 is to instead change aggregates to not
> use an internal stype anymore, and create dedicated types. The main
> downside here is that infunc and outfunc would need to be added for
> text serialization, in addition to the binary serialization. An open
> question is: Can we change the requirements for CREATE TYPE, so that
> types can be created without infunc and outfunc.
> 

I think it's +0.5 for the new dedicated data types from me.

I admit I'm too lazy to read the whole thread from scratch, but I
believe we did discuss the possibility to reuse the serial/deserial
functions we already have, but the reason against that was the missing
cross-version stability. Parallel queries always run within a single
instance, hence there are no concerns about other versions. But this is
meant to support the remote node having a wildly different version.

I guess we might introduce another pair of serial/deserial functions,
with this guarantee. I know we (me and Jelte) discussed that in person
at some point, and there were arguments for doing the data types. But I
managed to forget the details :-(

> WHAT IS NEEDED?
> 
> The things needed for this patch are that docs need to be added, and
> detailed codereview needs to be done.

Yeah, I think the docs are must-have for a proper review.

> Feedback from more people on the two competing proposals for 0002
> would be very helpful in making a decision.
> 



regards

-- 
Tomas Vondra



pgsql-hackers by date:

Previous
From: Nathan Bossart
Date:
Subject: Re: Remove dependence on integer wrapping
Next
From: Peter Eisentraut
Date:
Subject: Re: [PATCH] Add get_bytes() and set_bytes() functions