RE: Partial aggregates pushdown - Mailing list pgsql-hackers

From Fujii.Yuki@df.MitsubishiElectric.co.jp"
Subject RE: Partial aggregates pushdown
Date
Msg-id TY2PR01MB38353E34056D0F91736F358395D42@TY2PR01MB3835.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: Partial aggregates pushdown  (Jelte Fennema-Nio <postgres@jeltef.nl>)
Responses Re: Partial aggregates pushdown
List pgsql-hackers
Hi. Jelte, hackers.

Thank you for your proposal and comments.

> From: Jelte Fennema-Nio <postgres@jeltef.nl>
> Sent: Monday, June 24, 2024 6:09 PM
> > 1. Generality
> > I believe we should develop a method that can theoretically apply to any
> aggregate function, even if we cannot implement it immediately. However, I find
> it exceptionally challenging to demonstrate that any internal transtype can be
> universally converted to a native data type for aggregate functions that are
> parallel-safe under the current parallel query feature. Specifically, proving this
> for user-defined aggregate functions presents a significant difficulty in my
> view.
> > On the other hand, I think that the usage of export and import functions can
> theoretically apply to any aggregate functions.
> 
> The only thing required when doing CREATE TYPE is having an INPUT and
> OUTPUT function for the type, which (de)serialize the type to text format. As
> far as I can tell by definition that requirement should be fine for any aggregates
> that we can do partial aggregation pushdown for. To clarify I'm not suggesting
> we should change any of the internal representation of the type for the current
> internal aggregates. I'm suggesting we create new native types (i.e. do CREATE
> TYPE) for those internal representations and then use the name of that type
> instead of internal.
I see. I maybe got your proposal.
Refer to your proposal, for avg(int8), 
I create a new native type like state_int8_avg
with the new typsend/typreceive functions
and use them to transmit the state value, right?

That might seem to be a more fundamental solution
because I can avoid adding export/import functions of my proposal,
which are the new components of aggregate function.
I have never considered the solution.
I appreciate your proposal.

However, I still have the following two questions.

1. Not necessary components of new native types
Refer to pg_type.dat, typinput and typoutput are required.
I think that in your proposal they are not necessary,
so waste. I think that it is not acceptable.
How can I resolve the problem? 

2. Many new native types
I think that, basically, each aggregate function does need a new native type.
For example,
avg(int8), avg(numeric), and var_pop(int4) has the same transtype, PolyNumAggState.
You said that it is enough to add only one native type like state_poly_num_agg
for supporting them, right?

But the combine functions of them might have quite different expectation
on the data items of PolyNumAggState like
the range of N(means count) and the true/false of calcSumX2
(the flag of calculating sum of squares).
The final functions of them have the similar expectation.
So, I think that, responded to your proposal,
each of them need a native data type
like state_int8_avg, state_numeric_avg, for safety.

And, we do need a native type for an aggregate function
whose transtype is not internal and not pseudo.
For avg(int4), the current transtype is _int8.
However, I do need a validation check on the number of the array
And the positiveness of count(the first element of the array).
Responded to your proposal,
I do need a new native type like state_int4_avg.

Consequently, I think that, responded to your proposal, finally
each of aggregate functions need a new native type
with typinput and typoutput.
That seems need the same amount of codes and more catalog data,
right?

> > 2. Amount of codes.
> > It could need more codes.
> 
> I think it would be about the same as your proposal. Instead of serializing to an
> intermediary existing type you serialize to string straight away. I think it would
> probably be slightly less code actually, since you don't have to add code to
> handle the new aggpartialexportfn and aggpartialimportfn columns.
> 
> > 3. Concern about performance
> > I'm concerned that altering the current internal data types could impact
> performance.
> 
> As explained above in my proposal all the aggregation code would remain
> unchanged, only new native types will be added. Thus performance won't be
> affected, because all aggregation code will be the same. The only thing that's
> changed is that the internal type now has a name and an INPUT and OUTPUT
> function.
I got it. Thank you.

> Not a full review but some initial notes:
Thank you. I don't have time today, so I'll answer after tomorrow.

Best regards, Yuki Fujii
--
Yuki Fujii
Information Technology R&D Center, Mitsubishi Electric Corporation

pgsql-hackers by date:

Previous
From: Shlok Kyal
Date:
Subject: Re: Pgoutput not capturing the generated columns
Next
From: Jelte Fennema-Nio
Date:
Subject: Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs