Home > mailing lists

Re: [HACKERS] Partition-wise aggregation/grouping - Mailing list pgsql-hackers

From	Jeevan Chalke
Subject	Re: [HACKERS] Partition-wise aggregation/grouping
Date	August 23, 2017 17:13:04
Msg-id	CAM2+6=UqFnFUypOvLdm5TgC+2M=-E0Q7_LOh0VDFFzmk2BBPzQ@mail.gmail.com Whole thread Raw
In response to	[HACKERS] Partition-wise aggregation/grouping (Jeevan Chalke <jeevan.chalke@enterprisedb.com>)
Responses	Re: [HACKERS] Partition-wise aggregation/grouping (Jeevan Chalke <jeevan.chalke@enterprisedb.com>)
List	pgsql-hackers

Tree view

Hi,

Attached is the patch to implement partition-wise aggregation/grouping.

As explained earlier, we produce a full aggregation for each partition when
partition keys are leading group by clauses and then append is performed.
Else we do a partial aggregation on each partition, append them and then add
finalization step over it.

I have observed that cost estimated for partition-wise aggregation and cost
for the plans without partition-wise aggregation is almost same. However,
execution time shows significant improvement (as explained my in the very
first email) with partition-wise aggregates. Planner chooses a plan according
to the costs, and thus most of the time plan without partition-wise
aggregation is chosen. Hence, to force partition-wise plans and for the
regression runs, I have added a GUC named partition_wise_agg_cost_factor to
adjust the costings.

This feature is only used when enable_partition_wise_agg GUC is set to on.

Here are the details of the patches in the patch-set:

0001 - Refactors sort and hash final grouping paths into separate functions.
Since partition-wise aggregation too builds paths same as that of
create_grouping_paths(), separated path creation for sort and hash agg into
separate functions. These functions later used by main partition-wise
aggregation/grouping patch.

0002 - Passes targetlist to get_number_of_groups().
We need to estimate groups for individual child relations and thus need to
pass targetlist corresponding to the child rel.

0003 - Adds enable_partition_wise_agg and partition_wise_agg_cost_factor GUCs.

0004 - Implements partition-wise aggregation.

0005 - Adds test-cases.

0006 - postgres_fdw changes which enable pushing aggregation for other upper
relations.

Since this patch is highly dependent on partition-wise join [1], one needs to
apply all those patches on HEAD (my repository head was at:
66ed3829df959adb47f71d7c903ac59f0670f3e1) before applying these patches in
order.

Suggestions / feedback / inputs ?

[1] https://www.postgresql.org/message-id/CAFjFpRd9Vqh_=-Ldv-XqWY006d07TJ+VXuhXCbdj=P1jukYBrw@mail.gmail.com

On Tue, Mar 21, 2017 at 12:47 PM, Jeevan Chalke <jeevan.chalke@enterprisedb.com> wrote:

Hi all,

Declarative partitioning is supported in PostgreSQL 10 and work is already in
progress to support partition-wise joins. Here is a proposal for partition-wise
aggregation/grouping. Our initial performance measurement has shown 7 times
performance when partitions are on foreign servers and approximately 15% when
partitions are local.

Partition-wise aggregation/grouping computes aggregates for each partition
separately. If the group clause contains the partition key, all the rows
belonging to a given group come from one partition, thus allowing aggregates
to be computed completely for each partition. Otherwise, partial aggregates
computed for each partition are combined across the partitions to produce the
final aggregates. This technique improves performance because:
i. When partitions are located on foreign server, we can push down the
aggregate to the foreign server.
ii. If hash table for each partition fits in memory, but that for the whole
relation does not, each partition-wise aggregate can use an in-memory hash
table.
iii. Aggregation at the level of partitions can exploit properties of
partitions like indexes, their storage etc.

Attached an experimental patch for the same based on the partition-wise join
patches posted in [1].

This patch currently implements partition-wise aggregation when group clause
contains the partitioning key. A query below, involving a partitioned table
with 3 partitions containing 1M rows each, producing total 30 groups showed
15% improvement over non-partition-wise aggregation. Same query showed 7 times
improvement when the partitions were located on the foreign servers.

Here is the sample plan:

postgres=# set enable_partition_wise_agg to true;
SET
postgres=# EXPLAIN ANALYZE SELECT a, count(*) FROM plt1 GROUP BY a;
                                                  QUERY PLAN
--------------------------------------------------------------------------------------------------------------
Append (cost=5100.00..61518.90 rows=30 width=12) (actual time=324.837..944.804 rows=30 loops=1)
   -> Foreign Scan (cost=5100.00..20506.30 rows=10 width=12) (actual time=324.837..324.838 rows=10 loops=1)
         Relations: Aggregate on (public.fplt1_p1 plt1)
   -> Foreign Scan (cost=5100.00..20506.30 rows=10 width=12) (actual time=309.954..309.956 rows=10 loops=1)
         Relations: Aggregate on (public.fplt1_p2 plt1)
   -> Foreign Scan (cost=5100.00..20506.30 rows=10 width=12) (actual time=310.002..310.004 rows=10 loops=1)
         Relations: Aggregate on (public.fplt1_p3 plt1)
Planning time: 0.370 ms
Execution time: 945.384 ms
(9 rows)

postgres=# set enable_partition_wise_agg to false;
SET
postgres=# EXPLAIN ANALYZE SELECT a, count(*) FROM plt1 GROUP BY a;
                                                              QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=121518.01..121518.31 rows=30 width=12) (actual time=6498.452..6498.459 rows=30 loops=1)
   Group Key: plt1.a
   -> Append (cost=0.00..106518.00 rows=3000001 width=4) (actual time=0.595..5769.592 rows=3000000 loops=1)
         -> Seq Scan on plt1 (cost=0.00..0.00 rows=1 width=4) (actual time=0.007..0.007 rows=0 loops=1)
         -> Foreign Scan on fplt1_p1 (cost=100.00..35506.00 rows=1000000 width=4) (actual time=0.587..1844.506 rows=1000000 loops=1)
         -> Foreign Scan on fplt1_p2 (cost=100.00..35506.00 rows=1000000 width=4) (actual time=0.384..1839.633 rows=1000000 loops=1)
         -> Foreign Scan on fplt1_p3 (cost=100.00..35506.00 rows=1000000 width=4) (actual time=0.402..1876.505 rows=1000000 loops=1)
Planning time: 0.251 ms
Execution time: 6499.018 ms
(9 rows)

Patch needs a lot of improvement including:
1. Support for partial partition-wise aggregation
2. Estimating number of groups for every partition
3. Estimating cost of partition-wise aggregation based on sample partitions
similar to partition-wise join
and much more.

In order to support partial aggregation on foreign partitions, we need support
to fetch partially aggregated results from the foreign server. That can be
handled as a separate follow-on patch.

Though is lot of work to be done, I would like to get suggestions/opinions from
hackers.

I would like to thank Ashutosh Bapat for providing a draft patch and helping
me off-list on this feature while he is busy working on partition-wise join
feature.

[1] https://www.postgresql.org/message-id/CAFjFpRcbY2QN3cfeMTzVEoyF5Lfku-ijyNR%3DPbXj1e%3D9a%3DqMoQ%40mail.gmail.com

Thanks

--
Jeevan Chalke
Principal Software Engineer, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Jeevan Chalke
Principal Software Engineer, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Attachment

partition-wise-agg-v1.tar.gz

pgsql-hackers by date:

From: Andres Freund
Date: 23 August 2017, 15:40:53
Subject: Re: [HACKERS] PATCH: Batch/pipelining support for libpq

From: Ashutosh Bapat
Date: 23 August 2017, 17:16:10
Subject: Re: [HACKERS] path toward faster partition pruning

Re: [HACKERS] Partition-wise aggregation/grouping - Mailing list pgsql-hackers

Attachment

Previous

Next