Re: [HACKERS] Partition-wise aggregation/grouping - Mailing list pgsql-hackers
From | Konstantin Knizhnik |
---|---|
Subject | Re: [HACKERS] Partition-wise aggregation/grouping |
Date | |
Msg-id | a6e8b255-8854-8c8e-b565-1c0a943bbf23@postgrespro.ru Whole thread Raw |
In response to | Re: [HACKERS] Partition-wise aggregation/grouping (Konstantin Knizhnik <k.knizhnik@postgrespro.ru>) |
List | pgsql-hackers |
On 11.11.2017 23:29, Konstantin Knizhnik wrote: > On 10/27/2017 02:01 PM, Jeevan Chalke wrote: >> Hi, >> >> Attached new patch-set here. Changes include: >> >> 1. Added separate patch for costing Append node as discussed up-front >> in the >> patch-set. >> 2. Since we now cost Append node, we don't need >> partition_wise_agg_cost_factor >> GUC. So removed that. The remaining patch hence merged into main >> implementation >> patch. >> 3. Updated rows in test-cases so that we will get partition-wise plans. >> >> Thanks > > I applied partition-wise-agg-v6.tar.gz patch to the master and use > shard.sh example from > https://www.postgresql.org/message-id/14577.1509723225%40localhost > Plan for count(*) is the following: > > shard=# explain select count(*) from orders; > QUERY PLAN > --------------------------------------------------------------------------------------- > > Finalize Aggregate (cost=100415.29..100415.30 rows=1 width=8) > -> Append (cost=50207.63..100415.29 rows=2 width=8) > -> Partial Aggregate (cost=50207.63..50207.64 rows=1 width=8) > -> Foreign Scan on orders_0 (cost=101.00..50195.13 > rows=5000 width=0) > -> Partial Aggregate (cost=50207.63..50207.64 rows=1 width=8) > -> Foreign Scan on orders_1 (cost=101.00..50195.13 > rows=5000 width=0) > > > We really calculate partial aggregate for each partition, but to do we > still have to fetch all data from remote host. > So for foreign partitions such plans is absolutely inefficient. > Amy be it should be combined with some other patch? > For example, with agg_pushdown_v4.tgz patch > https://www.postgresql.org/message-id/14577.1509723225%40localhost ? > But it is not applied after partition-wise-agg-v6.tar.gz patch. > Also postgres_fdw in 11dev is able to push down aggregates without > agg_pushdown_v4.tgz patch. > > In 0009-Teach-postgres_fdw-to-push-aggregates-for-child-rela.patch > there is the following check: > > /* Partial aggregates are not supported. */ > + if (extra->isPartial) > + return; > > If we just comment this line then produced plan will be the following: > > shard=# explain select sum(product_id) from orders; > QUERY PLAN > ---------------------------------------------------------------- > Finalize Aggregate (cost=308.41..308.42 rows=1 width=8) > -> Append (cost=144.18..308.41 rows=2 width=8) > -> Foreign Scan (cost=144.18..154.20 rows=1 width=8) > Relations: Aggregate on (public.orders_0 orders) > -> Foreign Scan (cost=144.18..154.20 rows=1 width=8) > Relations: Aggregate on (public.orders_1 orders) > (6 rows) > > And it is actually desired plan! > Obviously such approach will not always work. FDW really doesn't > support partial aggregates now. > But for most frequently used aggregates: sum, min, max, count > aggtype==aggtranstype and there is no difference > between partial and normal aggregate calculation. > So instead of (extra->isPartial) condition we can add more complex > check which will traverse pathtarget expressions and > check if it can be evaluated in this way. Or... extend FDW API to > support partial aggregation. > > But even the last plan is not ideal: it will calculate predicates at > each remote node sequentially. > There is parallel append patch: > https://www.postgresql.org/message-id/CAJ3gD9ctEcrVUmpY6fq_JUB6WDKGXAGd70EY68jVFA4kxMbKeQ%40mail.gmail.com > > but ... FDW doesn't support parallel scan, so parallel append can not > be applied in this case. > And we actually do not need parallel append with all its dynamic > workers here. > We just need to start commands at all remote servers and only after it > fetch results (which can be done sequentially). > > I am investigating problem of efficient execution of OLAP queries on > sharded tables (tables with remote partitions). > After reading all this threads and corresponding patches, it seems to me > that we already have most of parts of the puzzle, what we need is to > put them on right places and may be add missed ones. > I wonder if somebody is busy with it and can I somehow help here? > > Also I am not quite sure about the best approach with parallel > execution of distributed query at all nodes. > Should we make postgres_fdw parallel safe and use parallel append? How > difficult it will be? > Or in addition to parallel append we should also have "asynchronous > append" which will be able to initiate execution at all nodes? > It seems to be close to merge append, because it should simultaneously > traverse all cursors. > > Looks like second approach is easier for implementation. But in case > of sharded table, distributed query may need to traverse both remote > and local shards and this approach doesn't allow to processed several > local shards in parallel. > I attach small patch for postgres_fdw.c which allows concurrent execution of aggregates by all remote servers (when them are accessed through postgres_fdw). I have added "postgres_fdw.use_prefetch" GUC to enable/disable prefetching data in postgres_fdw. This patch should be applied after of partition-wise-agg-v6.tar.gz patch. With shard example and the following two GUCs set: shard=# set postgres_fdw.use_prefetch=on; shard=# set enable_partition_wise_agg=on; shard=# select sum(product_id) from orders; sum --------- 9965891 (1 row) shard=# explain select sum(product_id) from orders; QUERY PLAN ---------------------------------------------------------------- Finalize Aggregate (cost=308.41..308.42 rows=1 width=8) -> Append (cost=144.18..308.41 rows=2 width=8) -> Foreign Scan (cost=144.18..154.20 rows=1 width=8) Relations: Aggregate on (public.orders_0 orders) -> Foreign Scan (cost=144.18..154.20 rows=1 width=8) Relations: Aggregate on (public.orders_1 orders) (6 rows) sum aggregate is calculated in parallel by both servers. I have not tested it much, it is just prove of concept. -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
pgsql-hackers by date: