Re: Partial aggregates pushdown - Mailing list pgsql-hackers

From Ted Yu
Subject Re: Partial aggregates pushdown
Date
Msg-id CALte62x3WSPFptrw+wHQaQ0HR-LGT0XMSzf2Jfpd+inDDkxuyA@mail.gmail.com
Whole thread Raw
In response to RE: Partial aggregates pushdown  ("Fujii.Yuki@df.MitsubishiElectric.co.jp" <Fujii.Yuki@df.MitsubishiElectric.co.jp>)
Responses RE: [CAUTION!! freemail] Re: Partial aggregates pushdown
List pgsql-hackers


Hi Mr.Vondra, Mr.Pyhalov, Everyone.

I discussed with Mr.Pyhalov about the above draft by directly sending mail to
 him(outside of pgsql-hackers). Mr.Pyhalov allowed me to update his patch
along with the above draft. So I update Mr.Pyhalov's patch v10.

I wrote my patch for discussion.
My patch passes regression tests which contains additional basic postgres_fdw tests
for my patch's feature. But my patch doesn't contain sufficient documents and tests.
If reviewers accept my approach, I will add documents and tests to my patch.

The following is a my patch's readme.
# I simplified the above draft.

--readme of my patch
1. interface
1) pg_aggregate
There are the following additional columns.
a) partialaggfn
  data type    : regproc.
  default value: zero(means invalid).
  description  : This field refers to the special aggregate function(then we call
     this partialaggfunc)
    corresponding to aggregation function(then we call src) which has aggfnoid.
    partialaggfunc is used for partial aggregation pushdown by postgres_fdw.
    The followings are differences between the src and the special aggregate function.
      difference1) result type
        The result type is same as the src's transtype if the src's transtype
        is not internal.
        Otherwise the result type is bytea.
      difference2) final func
        The final func does not exist if the src's transtype is not internal.
        Otherwize the final func returns serialized value.
    For example, there is a partialaggfunc avg_p_int4 which corresponds to avg(int4)
    whose aggtranstype is _int4.
    The result value of avg_p_int4 is a float8 array which consists of count and
    summation. avg_p_int4 does not have finalfunc.
    For another example, there is a partialaggfunc avg_p_int8 which corresponds to
    avg(int8) whose aggtranstype is internal.
    The result value of avg_p_int8 is a bytea serialized array which consists of count
    and summation. avg_p_int8 has finalfunc int8_avg_serialize which is serialize function
    of avg(int8). This field is zero if there is no partialaggfunc.

b) partialagg_minversion
  data type    : int4.
  default value: zero(means current version).
  description  : This field is the minimum PostgreSQL server version which has
    partialaggfunc. This field is used for checking compatibility of partialaggfunc.

The above fields are valid in tuples for builtin avg, sum, min, max, count.
There are additional records which correspond to partialaggfunc for avg, sum, min, max,
count.

2) pg_proc
There are additional records which correspond to partialaggfunc for avg, sum, min, max,
count.

3) postgres_fdw
postgres_fdw has an additional foreign server option server_version. server_version is
integer value which means remote server version number. Default value of server_version
is zero. server_version is used for checking compatibility of partialaggfunc.

2. feature
postgres_fdw can pushdown partial aggregation of avg, sum, min, max, count.
Partial aggregation pushdown is fine when the following two conditions are both true.
  condition1) partialaggfn is valid.
  condition2) server_version is not less than partialagg_minversion
postgres_fdw executes pushdown the patialaggfunc instead of a src.
For example, we issue "select avg_p_int4(c) from t" instead of "select avg(c) from t"
in the above example.

postgres_fdw can pushdown every aggregate function which supports partial aggregation
if you add a partialaggfunc corresponding to the aggregate function by create aggregate
command.

3. difference between my patch and Mr.Pyhalov's v10 patch.
1) In my patch postgres_fdw can pushdown partial aggregation of avg
2) In my patch postgres_fdw can pushdown every aggregate function which supports partial
  aggregation if you add a partialaggfunc corresponding to the aggregate function.

4. sample commands in psql
\c postgres
drop database tmp;
create database tmp;
\c tmp
create extension postgres_fdw;
create server server_01 foreign data wrapper postgres_fdw options(host 'localhost', dbname 'tmp', server_version '160000', async_capable 'true');
create user mapping for postgres server server_01 options(user 'postgres', password 'postgres');
create server server_02 foreign data wrapper postgres_fdw options(host 'localhost', dbname 'tmp', server_version '160000', async_capable 'true');
create user mapping for postgres server server_02 options(user 'postgres', password 'postgres');

create table t(dt timestamp, id int4, name text, total int4, val float4, type int4, span interval) partition by list (type);

create table t1(dt timestamp, id int4, name text, total int4, val float4, type int4, span interval);
create table t2(dt timestamp, id int4, name text, total int4, val float4, type int4, span interval);

truncate table t1;
truncate table t2;
insert into t1 select timestamp'2020-01-01' + cast(t || ' seconds' as interval), t % 100, 'hoge' || t, 1, 1.1, 1, cast('1 seconds' as interval) from generate_series(1, 100000, 1) t;
insert into t1 select timestamp'2020-01-01' + cast(t || ' seconds' as interval), t % 100, 'hoge' || t, 2, 2.1, 1, cast('2 seconds' as interval) from generate_series(1, 100000, 1) t;
insert into t2 select timestamp'2020-01-01' + cast(t || ' seconds' as interval), t % 100, 'hoge' || t, 1, 1.1, 2, cast('1 seconds' as interval) from generate_series(1, 100000, 1) t;
insert into t2 select timestamp'2020-01-01' + cast(t || ' seconds' as interval), t % 100, 'hoge' || t, 2, 2.1, 2, cast('2 seconds' as interval) from generate_series(1, 100000, 1) t;

create foreign table f_t1 partition of t for values in (1) server server_01 options(table_name 't1');
create foreign table f_t2 partition of t for values in (2) server server_02 options(table_name 't2');

set enable_partitionwise_aggregate = on;
explain (verbose, costs off) select avg(total::int4), avg(total::int8) from t;
select avg(total::int4), avg(total::int8) from t;

Sincerely yours,
Yuuki Fujii
--
Yuuki Fujii
Information Technology R&D Center Mitsubishi Electric Corporation

Hi,
For partial_agg_compatible :

+ * Check that partial aggregate agg has compatibility

If the `agg` refers to func parameter, the parameter name is aggform

+       int32  partialagg_minversion = PG_VERSION_NUM;
+       if (aggform->partialagg_minversion == PARTIALAGG_MINVERSION_DEFAULT) {
+               partialagg_minversion = PG_VERSION_NUM;

I am curious why the same variable is assigned the same value twice. It seems the if block is redundant.

+       if ((fpinfo->server_version >= partialagg_minversion)) {
+               compatible = true;

The above can be simplified as: return fpinfo->server_version >= partialagg_minversion;

Cheers

pgsql-hackers by date:

Previous
From: Nathan Bossart
Date:
Subject: Re: wake up logical workers after ALTER SUBSCRIPTION
Next
From: Peter Smith
Date:
Subject: Re: Perform streaming logical transactions by background workers and parallel apply