RE: [CAUTION!! freemail] Re: Partial aggregates pushdown - Mailing list pgsql-hackers

From Fujii.Yuki@df.MitsubishiElectric.co.jp"
Subject RE: [CAUTION!! freemail] Re: Partial aggregates pushdown
Date
Msg-id OS3PR01MB6660AE8BB4D08125EB4D3279950D9@OS3PR01MB6660.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: Partial aggregates pushdown  (Ted Yu <yuzhihong@gmail.com>)
Responses Re: [CAUTION!! freemail] Re: Partial aggregates pushdown
List pgsql-hackers
Hi Mr.Yu.

Thank you for comments.

> + * Check that partial aggregate agg has compatibility
> 
> If the `agg` refers to func parameter, the parameter name is aggform
I fixed the above typo and made the above comment easy to understand
New comment is "Check that partial aggregate function of aggform exsits in remote"

> +       int32  partialagg_minversion = PG_VERSION_NUM;
> +       if (aggform->partialagg_minversion ==
> PARTIALAGG_MINVERSION_DEFAULT) {
> +               partialagg_minversion = PG_VERSION_NUM;
> 
> 
> I am curious why the same variable is assigned the same value twice. It seems
> the if block is redundant.
> 
> +       if ((fpinfo->server_version >= partialagg_minversion)) {
> +               compatible = true;
> 
> 
> The above can be simplified as: return fpinfo->server_version >=
> partialagg_minversion;
I fixed according to your comment.

Sincerely yours,
Yuuki Fujii

--
Yuuki Fujii
Information Technology R&D Center Mitsubishi Electric Corporation

> -----Original Message-----
> From: Ted Yu <yuzhihong@gmail.com>
> Sent: Tuesday, November 22, 2022 2:00 PM
> To: Fujii Yuki/藤井 雄規(MELCO/情報総研 DM最適G)
> <Fujii.Yuki@df.MitsubishiElectric.co.jp>
> Cc: Alexander Pyhalov <a.pyhalov@postgrespro.ru>; Tomas Vondra
> <tomas.vondra@enterprisedb.com>; PostgreSQL-development
> <pgsql-hackers@postgresql.org>; Andres Freund <andres@anarazel.de>;
> Zhihong Yu <zyu@yugabyte.com>; Julien Rouhaud <rjuju123@gmail.com>;
> Daniel Gustafsson <daniel@yesql.se>; Ilya Gladyshev
> <i.gladyshev@postgrespro.ru>
> Subject: [CAUTION!! freemail] Re: Partial aggregates pushdown
> 
> 
> 
> On Mon, Nov 21, 2022 at 5:02 PM Fujii.Yuki@df.MitsubishiElectric.co.jp
> <mailto:Fujii.Yuki@df.MitsubishiElectric.co.jp>
> <Fujii.Yuki@df.mitsubishielectric.co.jp
> <mailto:Fujii.Yuki@df.mitsubishielectric.co.jp> > wrote:
> 
> 
>     Hi Mr.Vondra, Mr.Pyhalov, Everyone.
> 
>     I discussed with Mr.Pyhalov about the above draft by directly sending
> mail to
>      him(outside of pgsql-hackers). Mr.Pyhalov allowed me to update his
> patch
>     along with the above draft. So I update Mr.Pyhalov's patch v10.
> 
>     I wrote my patch for discussion.
>     My patch passes regression tests which contains additional basic
> postgres_fdw tests
>     for my patch's feature. But my patch doesn't contain sufficient
> documents and tests.
>     If reviewers accept my approach, I will add documents and tests to my
> patch.
> 
>     The following is a my patch's readme.
>     # I simplified the above draft.
> 
>     --readme of my patch
>     1. interface
>     1) pg_aggregate
>     There are the following additional columns.
>     a) partialaggfn
>       data type    : regproc.
>       default value: zero(means invalid).
>       description  : This field refers to the special aggregate
> function(then we call
>          this partialaggfunc)
>         corresponding to aggregation function(then we call src) which has
> aggfnoid.
>         partialaggfunc is used for partial aggregation pushdown by
> postgres_fdw.
>         The followings are differences between the src and the special
> aggregate function.
>           difference1) result type
>             The result type is same as the src's transtype if the src's
> transtype
>             is not internal.
>             Otherwise the result type is bytea.
>           difference2) final func
>             The final func does not exist if the src's transtype is not
> internal.
>             Otherwize the final func returns serialized value.
>         For example, there is a partialaggfunc avg_p_int4 which
> corresponds to avg(int4)
>         whose aggtranstype is _int4.
>         The result value of avg_p_int4 is a float8 array which consists of
> count and
>         summation. avg_p_int4 does not have finalfunc.
>         For another example, there is a partialaggfunc avg_p_int8 which
> corresponds to
>         avg(int8) whose aggtranstype is internal.
>         The result value of avg_p_int8 is a bytea serialized array which
> consists of count
>         and summation. avg_p_int8 has finalfunc int8_avg_serialize
> which is serialize function
>         of avg(int8). This field is zero if there is no partialaggfunc.
> 
>     b) partialagg_minversion
>       data type    : int4.
>       default value: zero(means current version).
>       description  : This field is the minimum PostgreSQL server version
> which has
>         partialaggfunc. This field is used for checking compatibility of
> partialaggfunc.
> 
>     The above fields are valid in tuples for builtin avg, sum, min, max,
> count.
>     There are additional records which correspond to partialaggfunc for
> avg, sum, min, max,
>     count.
> 
>     2) pg_proc
>     There are additional records which correspond to partialaggfunc for
> avg, sum, min, max,
>     count.
> 
>     3) postgres_fdw
>     postgres_fdw has an additional foreign server option server_version.
> server_version is
>     integer value which means remote server version number. Default
> value of server_version
>     is zero. server_version is used for checking compatibility of
> partialaggfunc.
> 
>     2. feature
>     postgres_fdw can pushdown partial aggregation of avg, sum, min, max,
> count.
>     Partial aggregation pushdown is fine when the following two
> conditions are both true.
>       condition1) partialaggfn is valid.
>       condition2) server_version is not less than partialagg_minversion
>     postgres_fdw executes pushdown the patialaggfunc instead of a src.
>     For example, we issue "select avg_p_int4(c) from t" instead of "select
> avg(c) from t"
>     in the above example.
> 
>     postgres_fdw can pushdown every aggregate function which supports
> partial aggregation
>     if you add a partialaggfunc corresponding to the aggregate function by
> create aggregate
>     command.
> 
>     3. difference between my patch and Mr.Pyhalov's v10 patch.
>     1) In my patch postgres_fdw can pushdown partial aggregation of avg
>     2) In my patch postgres_fdw can pushdown every aggregate function
> which supports partial
>       aggregation if you add a partialaggfunc corresponding to the
> aggregate function.
> 
>     4. sample commands in psql
>     \c postgres
>     drop database tmp;
>     create database tmp;
>     \c tmp
>     create extension postgres_fdw;
>     create server server_01 foreign data wrapper postgres_fdw
> options(host 'localhost', dbname 'tmp', server_version '160000', async_capable
> 'true');
>     create user mapping for postgres server server_01 options(user
> 'postgres', password 'postgres');
>     create server server_02 foreign data wrapper postgres_fdw
> options(host 'localhost', dbname 'tmp', server_version '160000', async_capable
> 'true');
>     create user mapping for postgres server server_02 options(user
> 'postgres', password 'postgres');
> 
>     create table t(dt timestamp, id int4, name text, total int4, val float4, type
> int4, span interval) partition by list (type);
> 
>     create table t1(dt timestamp, id int4, name text, total int4, val float4,
> type int4, span interval);
>     create table t2(dt timestamp, id int4, name text, total int4, val float4,
> type int4, span interval);
> 
>     truncate table t1;
>     truncate table t2;
>     insert into t1 select timestamp'2020-01-01' + cast(t || ' seconds' as
> interval), t % 100, 'hoge' || t, 1, 1.1, 1, cast('1 seconds' as interval) from
> generate_series(1, 100000, 1) t;
>     insert into t1 select timestamp'2020-01-01' + cast(t || ' seconds' as
> interval), t % 100, 'hoge' || t, 2, 2.1, 1, cast('2 seconds' as interval) from
> generate_series(1, 100000, 1) t;
>     insert into t2 select timestamp'2020-01-01' + cast(t || ' seconds' as
> interval), t % 100, 'hoge' || t, 1, 1.1, 2, cast('1 seconds' as interval) from
> generate_series(1, 100000, 1) t;
>     insert into t2 select timestamp'2020-01-01' + cast(t || ' seconds' as
> interval), t % 100, 'hoge' || t, 2, 2.1, 2, cast('2 seconds' as interval) from
> generate_series(1, 100000, 1) t;
> 
>     create foreign table f_t1 partition of t for values in (1) server server_01
> options(table_name 't1');
>     create foreign table f_t2 partition of t for values in (2) server server_02
> options(table_name 't2');
> 
>     set enable_partitionwise_aggregate = on;
>     explain (verbose, costs off) select avg(total::int4), avg(total::int8) from
> t;
>     select avg(total::int4), avg(total::int8) from t;
> 
>     Sincerely yours,
>     Yuuki Fujii
>     --
>     Yuuki Fujii
>     Information Technology R&D Center Mitsubishi Electric Corporation
> 
> 
> 
> Hi,
> For partial_agg_compatible :
> 
> + * Check that partial aggregate agg has compatibility
> 
> If the `agg` refers to func parameter, the parameter name is aggform
> 
> +       int32  partialagg_minversion = PG_VERSION_NUM;
> +       if (aggform->partialagg_minversion ==
> PARTIALAGG_MINVERSION_DEFAULT) {
> +               partialagg_minversion = PG_VERSION_NUM;
> 
> 
> I am curious why the same variable is assigned the same value twice. It seems
> the if block is redundant.
> 
> +       if ((fpinfo->server_version >= partialagg_minversion)) {
> +               compatible = true;
> 
> 
> The above can be simplified as: return fpinfo->server_version >=
> partialagg_minversion;
> 
> Cheers

Attachment

pgsql-hackers by date:

Previous
From: Ronan Dunklau
Date:
Subject: Re: Asynchronous execution support for Custom Scan
Next
From: vignesh C
Date:
Subject: Re: Time delayed LR (WAS Re: logical replication restrictions)