Home > mailing lists

RE: Partial aggregates pushdown - Mailing list pgsql-hackers

From	Fujii.Yuki@df.MitsubishiElectric.co.jp"
Subject	RE: Partial aggregates pushdown
Date	July 7, 2024 21:46:31
Msg-id	TY2PR01MB3835C0DC967E6958C4C8040995D92@TY2PR01MB3835.jpnprd01.prod.outlook.com Whole thread Raw
In response to	Re: Partial aggregates pushdown (Bruce Momjian <bruce@momjian.us>)
Responses	Re: Partial aggregates pushdown
List	pgsql-hackers

Tree view

Hi Jelte and hackers,

I've reconsidered which of the following two approaches is the best.
  Approach1: Adding export/import functions to transmit state values.
  Approach 2: Adding native types which are equal to state values.

In my mind, Approach1 is superior. Therefore, if there are no objections this week, I plan to resume implementing
Approach1next week. I would appreciate it if anyone could discuss the topic with me or ask questions. 

I believe that while Approach1 has the extendability to support situations where local and remote major versions
differ,Approach2 lacks this extendability. Additionally, it seems that Approach1 requires fewer additional lines of
codecompared to Approach2. I'm also concerned that Approach2 may cause the catalog pg_type to bloat. 

Although Approach2 offers the benefit of avoiding the addition of columns to pg_aggregate, I think this benefit is
smallerthan the advantages of Approach1 mentioned above. 

Next, I will present my complete comparison. The comparison points are as follows:
  1. Extendability
  2. Amount of codes
  3. Catalog size
  4. Developer burden
  5. Additional columns to catalogs

1. Extendability
I believe it is crucial to support scenarios where the local and remote major versions may differ in the future (see
thebelow). 

https://www.postgresql.org/message-id/4012625.1701120204%40sss.pgh.pa.us

Regarding this aspect, I consider Approach1 superior to Approach2. The reason is that:
・The data type of an aggregate function's state value may change with each major version increment.
・In Approach1, by extending the export/import functionalities to include the major version in which the state value was
created(refer to p.16 and p.17 of [1]), I can handle such situations. 
・On the other hand, it appears that Approach2 fundamentally lacks the capability to support these scenarios.

2. Amount of codes
Regarding this aspect, I find Approach1 to be better than Approach2.
In Approach1, developers only need to export/import functions and can use a standardized format for transmitting state
values.
In Approach2, developers have two options:
  Option1: Adding typinput/typoutput and typsend/typreceive.
  Option2: Adding typinput/typoutput only.
Option1 requires more lines of code, which may be seen as cumbersome by some developers.
Option2 restricts developers to using only text representation for transmitting state values, which I consider
limiting.

3. Catalog size
Regarding this point, I believe Approach1 is better than Approach2.
In Approach1, theoretically, it is necessary to add export/import functions to pg_proc for each aggregate.
In Approach2, theoretically, it is necessary to add typoutput/typinput functions (and typsend/typreceive if necessary)
topg_proc and add a native type to pg_type for each aggregate. 
I would like to emphasize that we should consider user-defined functions in addition to built-in aggregate functions.
I think most developers prefer to avoid bloating catalogs, even if they may not be able to specify exact reasons.
In fact, in Robert's previous review, he expressed a similar concern (see below).

https://www.postgresql.org/message-id/CA%2BTgmobvja%2Bjytj5zcEcYgqzOaeJiqrrJxgqDf1q%3D3k8FepuWQ%40mail.gmail.com

4. Developer burden.
Regarding this aspect, I believe Approach1 is better than Approach2.
In Approach1, developers have the following additional tasks:
  Task1-1: Create and define export/import functions.

In Approach2, developers have the following additional tasks:
  Task2-1: Create and define typoutput/input functions (and typesend/typreceive functions if necessary).
  Task2-2: Define a native type.

Approach1 requires fewer additional tasks, although the difference may be not substantial.

5. Additional columns to catalogs.
Regarding this aspect, Approach2 is better than Approach1.
Approach1 requires additional three columns in pg_aggregate, specifically the aggpartialpushdownsafe flag, export
functionreference, and import function reference. 
Approach2 does not require any additional columns in catalogs.
However, over the past four years of discussions, no one has expressed concerns about additional columns in catalogs.

[1]
https://www.postgresql.org/message-id/attachment/160659/PGConfDev2024_Presentation_Aggregation_Scaleout_FDW_Sharding_20240531.pdf

Best regards, Yuki Fujii
--
Yuki Fujii
Information Technology R&D Center, Mitsubishi Electric Corporation

pgsql-hackers by date:

From: Tom Lane
Date: 07 July 2024, 20:43:56
Subject: Re: XML test error on Arch Linux

From: "Fujii.Yuki@df.MitsubishiElectric.co.jp"
Date: 07 July 2024, 21:52:27
Subject: RE: Partial aggregates pushdown

RE: Partial aggregates pushdown - Mailing list pgsql-hackers

Previous

Next