Re: [HACKERS] Partition-wise aggregation/grouping - Mailing list pgsql-hackers
From | Ashutosh Bapat |
---|---|
Subject | Re: [HACKERS] Partition-wise aggregation/grouping |
Date | |
Msg-id | CAFjFpReOmtwo32YuH7hkYy3e1wTKQwvEd8q1ryVjWPknkY+_Yw@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] Partition-wise aggregation/grouping (Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>) |
Responses |
Re: [HACKERS] Partition-wise aggregation/grouping
Re: [HACKERS] Partition-wise aggregation/grouping |
List | pgsql-hackers |
Here are review comments for 0009 Only full aggregation is pushed on the remote server. I think we can live with that for a while but we need to be able to push down partial aggregates to the foreign server. I agree that it needs some infrastructure to serialized and deserialize the partial aggregate values, support unpartitioned aggregation first and then work on partitioned aggregates. That is definitely a separate piece of work. +-- =================================================================== +-- test partition-wise-aggregates +-- =================================================================== +CREATE TABLE pagg_tab (a int, b int, c text) PARTITION BY RANGE(a); +CREATE TABLE pagg_tab_p1 (a int, b int, c text); +CREATE TABLE pagg_tab_p2 (a int, b int, c text); +CREATE TABLE pagg_tab_p3 (a int, b int, c text); Like partition-wise join testcases please use LIKE so that it's easy to change the table schema if required. +INSERT INTO pagg_tab_p1 SELECT i % 30, i % 50, to_char(i/30, 'FM0000') FROM generate_series(1, 3000) i WHERE (i % 30) < 10; +INSERT INTO pagg_tab_p2 SELECT i % 30, i % 50, to_char(i/30, 'FM0000') FROM generate_series(1, 3000) i WHERE (i % 30) < 20 and (i % 30) >= 10; +INSERT INTO pagg_tab_p3 SELECT i % 30, i % 50, to_char(i/30, 'FM0000') FROM generate_series(1, 3000) i WHERE (i % 30) < 30 and (i % 30) >= 20; We have to do this because INSERT tuple routing to a foreign partition is not supported right now. Somebody has to remember to change this to a single statement once that's done. +ANALYZE fpagg_tab_p1; +ANALYZE fpagg_tab_p2; +ANALYZE fpagg_tab_p3; I thought this is not needed. When you ANALYZE the partitioned table, it would analyze the partitions as well. But I see that partition-wise join is also ANALYZING the foreign partitions separately. When I ran ANALYZE on a partitioned table with foreign partitions, statistics for only the local tables (partitioned and partitions) was updated. Of course this is separate work, but probably needs to be fixed. +-- When GROUP BY clause matches with PARTITION KEY. +-- Plan when partition-wise-agg is disabled s/when/with/ +-- Plan when partition-wise-agg is enabled s/when/with/ + -> Append Just like ForeignScan node's Relations tell what kind of ForeignScan this is, may be we should annotate Append to tell whether the children are joins, aggregates or relation scans. That might be helpful. Of course as another patch. +SELECT a, sum(b), min(b), count(*) FROM pagg_tab GROUP BY a HAVING avg(b) < 25 ORDER BY 1; + a | sum | min | count +----+------+-----+------- + 0 | 2000 | 0 | 100 + 1 | 2100 | 1 | 100 [ ... clipped ...] + 23 | 2300 | 3 | 100 + 24 | 2400 | 4 | 100 +(15 rows) May be we want to reduce the number of rows to a few by using a stricter HAVING clause? + +-- When GROUP BY clause not matches with PARTITION KEY. ... clause does not match ... +EXPLAIN (VERBOSE, COSTS OFF) +SELECT b, avg(a), max(a), count(*) FROM pagg_tab GROUP BY b HAVING sum(a) < 800 ORDER BY 1; + QUERY PLAN +----------------------------------------------------------------------------------------------------------------------------------------------------- + Sort + Output: fpagg_tab_p1.b, (avg(fpagg_tab_p1.a)), (max(fpagg_tab_p1.a)), (count(*)) + -> Partial HashAggregate [ ... clipped ... ] + Output: fpagg_tab_p3.b, PARTIAL avg(fpagg_tab_p3.a), PARTIAL max(fpagg_tab_p3.a), PARTIAL count(*), PARTIAL sum(fpagg_tab_p3.a) + Group Key: fpagg_tab_p3.b + -> Foreign Scan on public.fpagg_tab_p3 + Output: fpagg_tab_p3.b, fpagg_tab_p3.a + Remote SQL: SELECT a, b FROM public.pagg_tab_p3 +(26 rows) I think we interested in overall shape of the plan and not the details of Remote SQL etc. So, may be turn off VERBOSE. This comment applies to an earlier plan with enable_partition_wise_agg = false; + +SELECT b, avg(a), max(a), count(*) FROM pagg_tab GROUP BY b HAVING sum(a) < 800 ORDER BY 1; + b | avg | max | count +----+---------------------+-----+------- + 0 | 10.0000000000000000 | 20 | 60 + 1 | 11.0000000000000000 | 21 | 60 [... clipped ...] + 42 | 12.0000000000000000 | 22 | 60 + 43 | 13.0000000000000000 | 23 | 60 +(20 rows) Since the aggregates were not pushed down, I doubt we should be testing the output. But this test is good to check partial aggregates over foreign partition scans, which we don't have in postgres_fdw.sql I think. So, may be add it as a separate patch? Can you please add a test where we reference a whole-row; that usually has troubles. - if (root->hasHavingQual && query->havingQual) + if (root->hasHavingQual && fpinfo->havingQual) This is not exactly a problem with your patch, but why do we need to check both the boolean and the actual clauses? If the boolean is true, query->havingQual should be non-NULL and NULL otherwise. /* Grouping information */ List *grouped_tlist; + PathTarget *grouped_target; + Node *havingQual; I think we don't need havingQual as a separate member. foreign_grouping_ok() separates the clauses in havingQual into shippable and non-shippable clauses and saves in local_conditions and remote_conditions. Probably we want to use those instead of adding a new member. index 04e43cc..c8999f6 100644 --- a/src/include/foreign/fdwapi.h +++ b/src/include/foreign/fdwapi.h @@ -62,7 +62,8 @@ typedef void (*GetForeignJoinPaths_function) (PlannerInfo *root, typedef void (*GetForeignUpperPaths_function) (PlannerInfo *root, UpperRelationKind stage, RelOptInfo *input_rel, - RelOptInfo *output_rel); + RelOptInfo *output_rel, + UpperPathExtraData *extra); Probably this comment belongs to 0007, but it's in this patch that it becomes clear how invasive UpperPathExtraData changes are. While UpperPathExtraData has upper paths in the name, all of its members are related to grouping. That's fine since we only support partition-wise aggregate and not the other upper operations. But if we were to do that in future, which of these members would be applicable to other upper relations? inputRows, pathTarget, partialPathTarget may be applicable to other upper rels as well. can_sort, can_hash may be applicable to DISTINCT, SORT relations. isPartial and havingQual will be applicable only to Grouping/Aggregation. So, may be it's ok, and like RelOptInfo we may separate them by comments. Another problem with that structure is its name doesn't mention that the structure is used only for child upper relations, whereas the code assumes that if extra is not present it's a parent upper relation. May be we want to rename it to that effect or always use it whether for a parent or a child relation. We may want to rename pathTarget and partialPathTarget as relTarget and partialRelTarget since those targets are not specific to any path, but will be applicable to all the paths created for that rel. On Mon, Dec 4, 2017 at 7:44 AM, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote: > On Sat, Dec 2, 2017 at 4:08 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> On Fri, Dec 1, 2017 at 7:41 AM, Ashutosh Bapat >> <ashutosh.bapat@enterprisedb.com> wrote: >>> This code creates plans where there are multiple Gather nodes under an Append >>> node. >> >> We should avoid that. Starting and stopping workers is inefficient, >> and precludes things like turning the Append into a Parallel Append. > > Ah, I didn't think about it. Thanks for bringing it up. > >> >>> AFAIU, the workers assigned to one gather node can be reused until that >>> Gather node finishes. Having multiple Gather nodes under an Append mean that >>> every worker will be idle from the time that worker finishes the work till the >>> last worker finishes the work. >> >> No, workers will exit as soon as they finish. They don't hang around idle. > > Sorry, I think I used wrong word "idle". I meant that if a worker > finishes and exists, the query can't use it that worker slot until the > next Gather node starts. But as you pointed out, starting and stopping > a worker is costlier than the cost of not using the slot. So we should > avoid such plans. > >> >>> index b422050..1941468 100644 >>> --- a/src/tools/pgindent/typedefs.list >>> +++ b/src/tools/pgindent/typedefs.list >>> @@ -2345,6 +2345,7 @@ UnlistenStmt >>> UnresolvedTup >>> UnresolvedTupData >>> UpdateStmt >>> +UpperPathExtraData >>> UpperRelationKind >>> UpperUniquePath >>> UserAuth >>> >>> Do we commit this file as part of the feature? >> >> Andres and I regularly commit such changes; Tom rejects them. >> > > We will leave it to the committer to decide what to do with this hunk. > > -- > Best Wishes, > Ashutosh Bapat > EnterpriseDB Corporation > The Postgres Database Company -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
pgsql-hackers by date: