Re: Partial aggregates pushdown - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: Partial aggregates pushdown |
Date | |
Msg-id | 982a903e-cb59-4b60-911a-bdbb88da594e@vondra.me Whole thread Raw |
In response to | Re: Partial aggregates pushdown (Jelte Fennema-Nio <postgres@jeltef.nl>) |
Responses |
Re: Partial aggregates pushdown
Re: Partial aggregates pushdown |
List | pgsql-hackers |
On 8/8/24 13:48, Jelte Fennema-Nio wrote: > SUMMARY OF THREAD > > The design of patch 0001 is agreed upon by everyone on the thread (so > far). This adds the PARTIAL_AGGREGATE label for aggregates, which will > cause the finalfunc not to run. It also starts using PARTIAL_AGGREGATE > for pushdown of aggregates in postgres_fdw. In 0001 PARTIAL_AGGREGATE > is only supported for aggregates with a non-internal/pseudo type as > the stype. > I don't have a strong opinion on this, but I wonder if someone might object this essentially extends the syntax with something that is not (and never will be) in the SQL standard. I wonder if there's some precedent for encoding such explicit execution instructions into the query itself? That reminds me - the PARTIAL_AGGREGATE label is per aggregate, but I don't think it makes much sense to do this only for some aggregates, right? Do we have a way to make sure the query is "consistent"? I'm not sure if doing this on the source (before pushdown) is enough. Could there be a difference in what the remote instance supports? The only alternative that I can think of (and that I believe was already mentioned in this thread) is to set some GUC that forces the top-most query level to do this (all aggregates at that level). That's have the benefit of always affecting all aggregates. > The design for patch 0002 is still under debate. This would expand on > the functionality added by adding support for PARTIAL_AGGREGATE for > aggregates with an internal stype. This is done by returning a byte > array containing the bytes that the serialfunc of the aggregate > returns. > > A competing proposal for 0002 is to instead change aggregates to not > use an internal stype anymore, and create dedicated types. The main > downside here is that infunc and outfunc would need to be added for > text serialization, in addition to the binary serialization. An open > question is: Can we change the requirements for CREATE TYPE, so that > types can be created without infunc and outfunc. > I think it's +0.5 for the new dedicated data types from me. I admit I'm too lazy to read the whole thread from scratch, but I believe we did discuss the possibility to reuse the serial/deserial functions we already have, but the reason against that was the missing cross-version stability. Parallel queries always run within a single instance, hence there are no concerns about other versions. But this is meant to support the remote node having a wildly different version. I guess we might introduce another pair of serial/deserial functions, with this guarantee. I know we (me and Jelte) discussed that in person at some point, and there were arguments for doing the data types. But I managed to forget the details :-( > WHAT IS NEEDED? > > The things needed for this patch are that docs need to be added, and > detailed codereview needs to be done. Yeah, I think the docs are must-have for a proper review. > Feedback from more people on the two competing proposals for 0002 > would be very helpful in making a decision. > regards -- Tomas Vondra
pgsql-hackers by date: