RE: Partial aggregates pushdown - Mailing list pgsql-hackers
From | Fujii.Yuki@df.MitsubishiElectric.co.jp" |
---|---|
Subject | RE: Partial aggregates pushdown |
Date | |
Msg-id | TY2PR01MB3835C0DC967E6958C4C8040995D92@TY2PR01MB3835.jpnprd01.prod.outlook.com Whole thread Raw |
In response to | Re: Partial aggregates pushdown (Bruce Momjian <bruce@momjian.us>) |
Responses |
Re: Partial aggregates pushdown
|
List | pgsql-hackers |
Hi Jelte and hackers, I've reconsidered which of the following two approaches is the best. Approach1: Adding export/import functions to transmit state values. Approach 2: Adding native types which are equal to state values. In my mind, Approach1 is superior. Therefore, if there are no objections this week, I plan to resume implementing Approach1next week. I would appreciate it if anyone could discuss the topic with me or ask questions. I believe that while Approach1 has the extendability to support situations where local and remote major versions differ,Approach2 lacks this extendability. Additionally, it seems that Approach1 requires fewer additional lines of codecompared to Approach2. I'm also concerned that Approach2 may cause the catalog pg_type to bloat. Although Approach2 offers the benefit of avoiding the addition of columns to pg_aggregate, I think this benefit is smallerthan the advantages of Approach1 mentioned above. Next, I will present my complete comparison. The comparison points are as follows: 1. Extendability 2. Amount of codes 3. Catalog size 4. Developer burden 5. Additional columns to catalogs 1. Extendability I believe it is crucial to support scenarios where the local and remote major versions may differ in the future (see thebelow). https://www.postgresql.org/message-id/4012625.1701120204%40sss.pgh.pa.us Regarding this aspect, I consider Approach1 superior to Approach2. The reason is that: ・The data type of an aggregate function's state value may change with each major version increment. ・In Approach1, by extending the export/import functionalities to include the major version in which the state value was created(refer to p.16 and p.17 of [1]), I can handle such situations. ・On the other hand, it appears that Approach2 fundamentally lacks the capability to support these scenarios. 2. Amount of codes Regarding this aspect, I find Approach1 to be better than Approach2. In Approach1, developers only need to export/import functions and can use a standardized format for transmitting state values. In Approach2, developers have two options: Option1: Adding typinput/typoutput and typsend/typreceive. Option2: Adding typinput/typoutput only. Option1 requires more lines of code, which may be seen as cumbersome by some developers. Option2 restricts developers to using only text representation for transmitting state values, which I consider limiting. 3. Catalog size Regarding this point, I believe Approach1 is better than Approach2. In Approach1, theoretically, it is necessary to add export/import functions to pg_proc for each aggregate. In Approach2, theoretically, it is necessary to add typoutput/typinput functions (and typsend/typreceive if necessary) topg_proc and add a native type to pg_type for each aggregate. I would like to emphasize that we should consider user-defined functions in addition to built-in aggregate functions. I think most developers prefer to avoid bloating catalogs, even if they may not be able to specify exact reasons. In fact, in Robert's previous review, he expressed a similar concern (see below). https://www.postgresql.org/message-id/CA%2BTgmobvja%2Bjytj5zcEcYgqzOaeJiqrrJxgqDf1q%3D3k8FepuWQ%40mail.gmail.com 4. Developer burden. Regarding this aspect, I believe Approach1 is better than Approach2. In Approach1, developers have the following additional tasks: Task1-1: Create and define export/import functions. In Approach2, developers have the following additional tasks: Task2-1: Create and define typoutput/input functions (and typesend/typreceive functions if necessary). Task2-2: Define a native type. Approach1 requires fewer additional tasks, although the difference may be not substantial. 5. Additional columns to catalogs. Regarding this aspect, Approach2 is better than Approach1. Approach1 requires additional three columns in pg_aggregate, specifically the aggpartialpushdownsafe flag, export functionreference, and import function reference. Approach2 does not require any additional columns in catalogs. However, over the past four years of discussions, no one has expressed concerns about additional columns in catalogs. [1] https://www.postgresql.org/message-id/attachment/160659/PGConfDev2024_Presentation_Aggregation_Scaleout_FDW_Sharding_20240531.pdf Best regards, Yuki Fujii -- Yuki Fujii Information Technology R&D Center, Mitsubishi Electric Corporation
pgsql-hackers by date: