RE: Partial aggregates pushdown - Mailing list pgsql-hackers
From | Fujii.Yuki@df.MitsubishiElectric.co.jp" |
---|---|
Subject | RE: Partial aggregates pushdown |
Date | |
Msg-id | TYAPR01MB55149B8569F2FDA262654C4F95BAA@TYAPR01MB5514.jpnprd01.prod.outlook.com Whole thread Raw |
In response to | Re: Partial aggregates pushdown (Alexander Pyhalov <a.pyhalov@postgrespro.ru>) |
Responses |
Re: Partial aggregates pushdown
Re: Partial aggregates pushdown |
List | pgsql-hackers |
Hi Mr. Haas, hackers. Thank you for your thoughtful comments. > From: Robert Haas <robertmhaas@gmail.com> > Sent: Tuesday, November 21, 2023 5:52 AM > I do have a concern about this, though. It adds a lot of bloat. It adds a whole lot of additional entries to pg_aggregate,and > every new aggregate we add in the future will require a bonus entry for this, and it needs a bunch of new pg_proc entries > as well. One idea that I've had in the past is to instead introduce syntax that just does this, without requiring a separate > aggregate definition in each case. > For example, maybe instead of changing string_agg(whatever) to string_agg_p_text_text(whatever), you can say > PARTIAL_AGGREGATE > string_agg(whatever) or string_agg(PARTIAL_AGGREGATE whatever) or something. Then all aggregates could be treated > in a generic way. I'm not completely sure that's better, but I think it's worth considering. I believe this comment addresses a fundamental aspect of the approach. So, firstly, could we discuss whether we should fundamentally reconsider the approach? The approach adopted in this patch is as follows. Approach 1: Adding partial aggregation functions to the catalogs(pg_aggregate, pg_proc) The approach proposed by Mr.Haas is as follows. Approach 2: Adding a keyword to the SQL syntax to indicate partial aggregation requests The amount of code required to implement Approach 2 has not been investigated, but comparing Approach 1 and Approach 2 in other aspects, I believe they each have the following advantages and disadvantages. 1. Approach 1 (1) Advantages (a) No need to change the SQL syntax (2) Disadvantages (a) Catalog bloat As Mr.Haas pointed out, the catalog will bloat by adding partial aggregation functions (e.g. avg_p_int8(int8)) for each individual aggregate function (e.g. avg(int8)) in pg_aggregate and pg_proc (theoretically doubling the size). Some PostgreSQL developers and users may find this uncomfortable. (b) Increase in manual procedures Developers of new aggregate functions (both built-in and user-defined) need to manually add the partial aggregation functions when defining the aggregate functions. However, the procedure for adding partial aggregation functions for a certain aggregate function can be automated, so this problem can be resolved by improving the patch. The automation method involves the core part (AggregateCreate() and related functions) that executes the CREATE AGGREGATE command for user-defined functions. For built-in functions, it involves generating the initial data for the pg_aggregate catalog and pg_proc catalog from pg_aggregate.datand pg_proc.dat (using the genbki.pl script and related scripts). 2. Approach 2 (1) Advantages (a) No need to add partial aggregate functions to the catalogs for each aggregation (2) Disadvantages (a) Need to add non-standard keywords to the SQL syntax. I did not choose Approach2 because I was not confident that the disadvantage mentioned in 2.(2)(a) would be accepted by the PostgreSQL development community. If it is accepted, I think Approach 2 is smarter. Could you please provide your opinion on which approach is preferable after comparing these two approaches? If we cannot say anything without comparing the amount of source code, as Mr.Momjian mentioned, we need to estimate the amount of source code required to implement Approach2. Sincerely yours, Yuuki Fujii -- Yuuki Fujii Information Technology R&D Center Mitsubishi Electric Corporation
pgsql-hackers by date: