[HACKERS] Partition-wise aggregation/grouping - Mailing list pgsql-hackers

From Jeevan Chalke
Subject [HACKERS] Partition-wise aggregation/grouping
Date
Msg-id CAM2+6=V64_xhstVHie0Rz=KPEQnLJMZt_e314P0jaT_oJ9MR8A@mail.gmail.com
Whole thread Raw
Responses Re: [HACKERS] Partition-wise aggregation/grouping  (Antonin Houska <ah@cybertec.at>)
Re: [HACKERS] Partition-wise aggregation/grouping  (Jeevan Chalke <jeevan.chalke@enterprisedb.com>)
List pgsql-hackers
Hi all,

Declarative partitioning is supported in PostgreSQL 10 and work is already in
progress to support partition-wise joins. Here is a proposal for partition-wise
aggregation/grouping.  Our initial performance measurement has shown 7 times
performance when partitions are on foreign servers and approximately 15% when
partitions are local.

Partition-wise aggregation/grouping computes aggregates for each partition
separately.  If the group clause contains the partition key, all the rows
belonging to a given group come from one partition, thus allowing aggregates
to be computed completely for each partition.  Otherwise, partial aggregates
computed for each partition are combined across the partitions to produce the
final aggregates. This technique improves performance because:
i. When partitions are located on foreign server, we can push down the
aggregate to the foreign server.
ii. If hash table for each partition fits in memory, but that for the whole
relation does not, each partition-wise aggregate can use an in-memory hash
table.
iii. Aggregation at the level of partitions can exploit properties of
partitions like indexes, their storage etc.

Attached an experimental patch for the same based on the partition-wise join
patches posted in [1].

This patch currently implements partition-wise aggregation when group clause
contains the partitioning key.  A query below, involving a partitioned table
with 3 partitions containing 1M rows each, producing total 30 groups showed
15% improvement over non-partition-wise aggregation. Same query showed 7 times
improvement when the partitions were located on the foreign servers.

Here is the sample plan:

postgres=# set enable_partition_wise_agg to true;
SET
postgres=# EXPLAIN ANALYZE SELECT a, count(*) FROM plt1 GROUP BY a;
                                                  QUERY PLAN                                                 
--------------------------------------------------------------------------------------------------------------
 Append  (cost=5100.00..61518.90 rows=30 width=12) (actual time=324.837..944.804 rows=30 loops=1)
   ->  Foreign Scan  (cost=5100.00..20506.30 rows=10 width=12) (actual time=324.837..324.838 rows=10 loops=1)
         Relations: Aggregate on (public.fplt1_p1 plt1)
   ->  Foreign Scan  (cost=5100.00..20506.30 rows=10 width=12) (actual time=309.954..309.956 rows=10 loops=1)
         Relations: Aggregate on (public.fplt1_p2 plt1)
   ->  Foreign Scan  (cost=5100.00..20506.30 rows=10 width=12) (actual time=310.002..310.004 rows=10 loops=1)
         Relations: Aggregate on (public.fplt1_p3 plt1)
 Planning time: 0.370 ms
 Execution time: 945.384 ms
(9 rows)

postgres=# set enable_partition_wise_agg to false;
SET
postgres=# EXPLAIN ANALYZE SELECT a, count(*) FROM plt1 GROUP BY a;
                                                              QUERY PLAN                                                              
---------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=121518.01..121518.31 rows=30 width=12) (actual time=6498.452..6498.459 rows=30 loops=1)
   Group Key: plt1.a
   ->  Append  (cost=0.00..106518.00 rows=3000001 width=4) (actual time=0.595..5769.592 rows=3000000 loops=1)
         ->  Seq Scan on plt1  (cost=0.00..0.00 rows=1 width=4) (actual time=0.007..0.007 rows=0 loops=1)
         ->  Foreign Scan on fplt1_p1  (cost=100.00..35506.00 rows=1000000 width=4) (actual time=0.587..1844.506 rows=1000000 loops=1)
         ->  Foreign Scan on fplt1_p2  (cost=100.00..35506.00 rows=1000000 width=4) (actual time=0.384..1839.633 rows=1000000 loops=1)
         ->  Foreign Scan on fplt1_p3  (cost=100.00..35506.00 rows=1000000 width=4) (actual time=0.402..1876.505 rows=1000000 loops=1)
 Planning time: 0.251 ms
 Execution time: 6499.018 ms
(9 rows)

Patch needs a lot of improvement including:
1. Support for partial partition-wise aggregation
2. Estimating number of groups for every partition
3. Estimating cost of partition-wise aggregation based on sample partitions
similar to partition-wise join
and much more.

In order to support partial aggregation on foreign partitions, we need support
to fetch partially aggregated results from the foreign server. That can be
handled as a separate follow-on patch.

Though is lot of work to be done, I would like to get suggestions/opinions from
hackers.

I would like to thank Ashutosh Bapat for providing a draft patch and helping
me off-list on this feature while he is busy working on partition-wise join
feature.

[1] https://www.postgresql.org/message-id/CAFjFpRcbY2QN3cfeMTzVEoyF5Lfku-ijyNR%3DPbXj1e%3D9a%3DqMoQ%40mail.gmail.com

Thanks

--
Jeevan Chalke
Principal Software Engineer, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Attachment

pgsql-hackers by date:

Previous
From: Haribabu Kommi
Date:
Subject: Re: [HACKERS] Refactor handling of database attributes between pg_dump and pg_dumpall
Next
From: Anastasia Lubennikova
Date:
Subject: Re: [HACKERS] btree_gin and btree_gist for enums