Re: Parallel grouping sets - Mailing list pgsql-hackers
From | Richard Guo |
---|---|
Subject | Re: Parallel grouping sets |
Date | |
Msg-id | CAN_9JTwtzttEmdXvMbJqXt=51kXiBTCKEPKq6kk2PZ6Xz6m5ig@mail.gmail.com Whole thread Raw |
In response to | Re: Parallel grouping sets (Jesse Zhang <sbjesse@gmail.com>) |
Responses |
Re: Parallel grouping sets
|
List | pgsql-hackers |
Hi Jesse,
Thanks for reviewing these two patches.
On Sat, Jan 25, 2020 at 6:52 AM Jesse Zhang <sbjesse@gmail.com> wrote:
I glanced over both patches. Just the opposite, I have a hunch that v3
is always better than v5. Here's my 6-minute understanding of both.
v5 (the one with a simple partial aggregate) works by pushing a little
bit of partial aggregate onto workers, and perform grouping aggregate
above gather. This has two interesting outcomes: we can execute
unmodified partial aggregate on the workers, and execute almost
unmodified rollup aggreegate once the trans values are gathered. A
parallel plan for a query like
SELECT count(*) FROM foo GROUP BY GROUPING SETS (a), (b), (c), ();
can be
Finalize GroupAggregate
Output: count(*)
Group Key: a
Group Key: b
Group Key: c
Group Key: ()
Gather Merge
Partial GroupAggregate
Output: PARTIAL count(*)
Group Key: a, b, c
Sort
Sort Key: a, b, c
Parallel Seq Scan on foo
Yes, this is the idea of v5 patch.
v3 ("the one with grouping set id") really turns the plan from a tree to
a multiplexed pipe: we can execute grouping aggregate on the workers,
but only partially. When we emit the trans values, also tag the tuple
with a group id. After gather, finalize the aggregates with a modified
grouping aggregate. Unlike a non-split grouping aggregate, the finalize
grouping aggregate does not "flow" the results from one rollup to the
next one. Instead, each group only advances on partial inputs tagged for
the group.
Finalize HashAggregate
Output: count(*)
Dispatched by: (GroupingSetID())
Group Key: a
Group Key: b
Group Key: c
Gather
Partial GroupAggregate
Output: PARTIAL count(*), GroupingSetID()
Group Key: a
Sort Key: b
Group Key: b
Sort Key: c
Group Key: c
Sort
Sort Key: a
Parallel Seq Scan on foo
Yes, this is what v3 patch does.
We (Pengzhou and I) had an offline discussion on this plan and we have
some other idea. Since we have tagged 'GroupingSetId' for each tuple
produced by partial aggregate, why not then perform a normal grouping
sets aggregation in the final phase, with the 'GroupingSetId' included
in the group keys? The plan looks like:
We (Pengzhou and I) had an offline discussion on this plan and we have
some other idea. Since we have tagged 'GroupingSetId' for each tuple
produced by partial aggregate, why not then perform a normal grouping
sets aggregation in the final phase, with the 'GroupingSetId' included
in the group keys? The plan looks like:
# explain (costs off, verbose)
select c1, c2, c3, avg(c3) from gstest group by grouping sets((c1,c2),(c1),(c2,c3));
QUERY PLAN
------------------------------------------------------------------
Finalize GroupAggregate
Output: c1, c2, c3, avg(c3)
Group Key: (gset_id), gstest.c1, gstest.c2, gstest.c3
-> Sort
Output: c1, c2, c3, (gset_id), (PARTIAL avg(c3))
Sort Key: (gset_id), gstest.c1, gstest.c2, gstest.c3
-> Gather
Output: c1, c2, c3, (gset_id), (PARTIAL avg(c3))
Workers Planned: 4
-> Partial HashAggregate
Output: c1, c2, c3, gset_id, PARTIAL avg(c3)
Hash Key: gstest.c1, gstest.c2
Hash Key: gstest.c1
Hash Key: gstest.c2, gstest.c3
-> Parallel Seq Scan on public.gstest
Output: c1, c2, c3
select c1, c2, c3, avg(c3) from gstest group by grouping sets((c1,c2),(c1),(c2,c3));
QUERY PLAN
------------------------------------------------------------------
Finalize GroupAggregate
Output: c1, c2, c3, avg(c3)
Group Key: (gset_id), gstest.c1, gstest.c2, gstest.c3
-> Sort
Output: c1, c2, c3, (gset_id), (PARTIAL avg(c3))
Sort Key: (gset_id), gstest.c1, gstest.c2, gstest.c3
-> Gather
Output: c1, c2, c3, (gset_id), (PARTIAL avg(c3))
Workers Planned: 4
-> Partial HashAggregate
Output: c1, c2, c3, gset_id, PARTIAL avg(c3)
Hash Key: gstest.c1, gstest.c2
Hash Key: gstest.c1
Hash Key: gstest.c2, gstest.c3
-> Parallel Seq Scan on public.gstest
Output: c1, c2, c3
This plan should be able to give the correct results. We are still
thinking if it is a better plan than the 'multiplexed pipe' plan as in
v3. Inputs of thoughts here would be appreciated.
thinking if it is a better plan than the 'multiplexed pipe' plan as in
v3. Inputs of thoughts here would be appreciated.
Note that for the first approach to be viable, the partial aggregate
*has to* use a group key that's the union of all grouping sets. In cases
where individual columns have a low cardinality but joint cardinality is
high (say columns a, b, c each has 16 distinct values, but they are
independent, so there are 4096 distinct values on (a,b,c)), this results
in fairly high traffic through the shm tuple queue.
Yes, you are right. This is the case mentioned by David earlier in [1].
In this case, ideally the parallel plan would fail when competing with
non-parallel plan in add_path() and so not be chosen.
In this case, ideally the parallel plan would fail when competing with
non-parallel plan in add_path() and so not be chosen.
Thanks
Richard
pgsql-hackers by date: