RE: Partial aggregates pushdown - Mailing list pgsql-hackers
From | Fujii.Yuki@df.MitsubishiElectric.co.jp" |
---|---|
Subject | RE: Partial aggregates pushdown |
Date | |
Msg-id | OS3PR01MB6660E2F999B97D8ADA32AE6B95159@OS3PR01MB6660.jpnprd01.prod.outlook.com Whole thread Raw |
In response to | Re: Partial aggregates pushdown (Alexander Pyhalov <a.pyhalov@postgrespro.ru>) |
Responses |
Re: Partial aggregates pushdown
|
List | pgsql-hackers |
Hi Mr.Pyhalov. Thank you for comments. > I've looked through the patch. Overall I like this approach, but have > the following comments. > > 1) Why should we require partialaggfn for min()/max()/count()? We could > just use original functions for a lot of aggregates, and so it would be > possible to push down some partial aggregates to older servers. I'm not > sure that it's a strict requirement, but a nice thing to think about. > Can we use the function itself as partialaggfn, for example, for > sum(int4)? > For functions with internal aggtranstype (like sum(int8) it > would be more difficult). Thank you. I realized that partial aggregate pushdown is fine without partialaggfn if original function has no aggfinalfn and aggtranstype of it is not internal. So I have improved v12 by this realization. However, v13 requires partialaggfn for aggregate if it has aggfinalfn or aggtranstype of it is internal such as sum(int8). > 2) fpinfo->server_version is not aggregated, for example, when we form > fpinfo in foreign_join_ok(), it seems we should spread it in more places > in postgres_fdw.c. I have responded to your comment by adding copy of server_version in merge_fdw_options. > 3) In add_foreign_grouping_paths() it seems there's no need for > additional argument, we can look at extra->patype. Also Assert() in > add_foreign_grouping_paths() will fire in --enable-cassert build. I have fixed according to your comment. > 4) Why do you modify lookup_agg_function() signature? I don't see tests, > showing that it's neccessary. Perhaps, more precise function naming > should be used instead? I realized that there is no need of modification lookup_agg_function(). Instead, I use LookupFuncName(). > 5) In tests: > - Why version_num does have "name" type in > f_alter_server_version() function? > - You modify server_version option of 'loopback' server, but > don't reset it after test. This could affect further tests. > - "It's unsafe to push down partial aggregates with distinct" > in postgres_fdw.sql:3002 seems to be misleading. > 3001 > 3002 -- It's unsafe to push down partial aggregates with distinct > 3003 SELECT f_alter_server_version('loopback', 'set', -1); I have fixed according to your comment. > 6) While looking at it, could cause a crash with something like I have fixed this problem by using LookupFuncName() instead of lookup_agg_function. The following is readme of v13. --readme of Partial aggregates push down v13 1. interface 1) pg_aggregate There are the following additional columns. a) partialaggfn data type : regproc. default value: zero(means invalid). description : This field refers to the special aggregate function(then we call this partialaggfunc) corresponding to aggregation function(then we call src) which has aggfnoid. partialaggfunc is used for partial aggregation pushdown by postgres_fdw. The followings are differences between the src and the special aggregate function. difference1) result type The result type is same as the src's transtype if the src's transtype is not internal. Otherwise the result type is bytea. difference2) final func The final func does not exist if the src's transtype is not internal. Otherwize the final func returns serialized value. For example, there is a partialaggfunc avg_p_int4 which corresponds to avg(int4) whose aggtranstype is _int4. The result value of avg_p_int4 is a float8 array which consists of count and summation. avg_p_int4 does not have finalfunc. For another example, there is a partialaggfunc avg_p_int8 which corresponds to avg(int8) whose aggtranstype is internal. The result value of avg_p_int8 is a bytea serialized array which consists of count and summation. avg_p_int8 has finalfunc int8_avg_serialize which is serialize function of avg(int8). This field is zero if there is no partialaggfunc. b) partialagg_minversion data type : int4. default value: zero(means current version). description : This field is the minimum PostgreSQL server version which has partialaggfunc. This field is used for checking compatibility of partialaggfunc. The above fields are valid in tuples for builtin avg, sum, min, max, count. There are additional records which correspond to partialaggfunc for avg, sum, min, max, count. 2) pg_proc There are additional records which correspond to partialaggfunc for avg, sum, min, max, count. 3) postgres_fdw postgres_fdw has an additional foreign server option server_version. server_version is integer value which means remote server version number. Default value of server_version is zero. server_version is used for checking compatibility of partialaggfunc. 2. feature Partial aggregation pushdown is fine when either of the following conditions is true. condition1) aggregate function has not internal aggtranstype and has no aggfinalfn. condition2) the following two conditions are both true. condition2-1) partialaggfn is valid. condition2-2) server_version is not less than partialagg_minversion postgres_fdw executes pushdown the patialaggfunc instead of a src. postgres_fdw can pushdown partial aggregation of aggregate function which has internal aggtranstype or has aggfinalfn if the function is one of avg, sum(int8), sum(numeric). For example, we issue "select avg_p_int4(c) from t" instead of "select avg(c) from t" in the above example. postgres_fdw can pushdown every aggregate function which supports partial aggregation if you add a partialaggfunc corresponding to the aggregate function by create aggregate command. 3. sample commands in psql \c postgres drop database tmp; create database tmp; \c tmp create extension postgres_fdw; create server server_01 foreign data wrapper postgres_fdw options(host 'localhost', dbname 'tmp', server_version '160000',async_capable 'true'); create user mapping for postgres server server_01 options(user 'postgres', password 'postgres'); create server server_02 foreign data wrapper postgres_fdw options(host 'localhost', dbname 'tmp', server_version '160000',async_capable 'true'); create user mapping for postgres server server_02 options(user 'postgres', password 'postgres'); create table t(dt timestamp, id int4, name text, total int4, val float4, type int4, span interval) partition by list (type); create table t1(dt timestamp, id int4, name text, total int4, val float4, type int4, span interval); create table t2(dt timestamp, id int4, name text, total int4, val float4, type int4, span interval); truncate table t1; truncate table t2; insert into t1 select timestamp'2020-01-01' + cast(t || ' seconds' as interval), t % 100, 'hoge' || t, 1, 1.1, 1, cast('1seconds' as interval) from generate_series(1, 100000, 1) t; insert into t1 select timestamp'2020-01-01' + cast(t || ' seconds' as interval), t % 100, 'hoge' || t, 2, 2.1, 1, cast('2seconds' as interval) from generate_series(1, 100000, 1) t; insert into t2 select timestamp'2020-01-01' + cast(t || ' seconds' as interval), t % 100, 'hoge' || t, 1, 1.1, 2, cast('1seconds' as interval) from generate_series(1, 100000, 1) t; insert into t2 select timestamp'2020-01-01' + cast(t || ' seconds' as interval), t % 100, 'hoge' || t, 2, 2.1, 2, cast('2seconds' as interval) from generate_series(1, 100000, 1) t; create foreign table f_t1 partition of t for values in (1) server server_01 options(table_name 't1'); create foreign table f_t2 partition of t for values in (2) server server_02 options(table_name 't2'); set enable_partitionwise_aggregate = on; explain (verbose, costs off) select avg(total::int4), avg(total::int8) from t; select avg(total::int4), avg(total::int8) from t; -- Sincerely yours, Yuuki Fujii -- Yuuki Fujii Information Technology R&D Center Mitsubishi Electric Corporation
Attachment
pgsql-hackers by date: