Re: Eager aggregation, take 3 - Mailing list pgsql-hackers
From | Paul George |
---|---|
Subject | Re: Eager aggregation, take 3 |
Date | |
Msg-id | CALA8mJquG_zCJXfVwash5LKqHGtZXQmq7RfTSaRDUzGYeW=7Rw@mail.gmail.com Whole thread Raw |
In response to | Re: Eager aggregation, take 3 (Richard Guo <guofenglinux@gmail.com>) |
Responses |
Re: Eager aggregation, take 3
|
List | pgsql-hackers |
Richard:
Thanks for reviving this patch and for all of your work on it! Eager aggregation pushdown will be beneficial for my work and I'm hoping to see it land.
I was playing around with v9 of the patches and was specifically curious about this previous statement...
Thanks for reviving this patch and for all of your work on it! Eager aggregation pushdown will be beneficial for my work and I'm hoping to see it land.
I was playing around with v9 of the patches and was specifically curious about this previous statement...
>This patch also makes eager aggregation work with outer joins. With
>outer join, the aggregate cannot be pushed down if any column referenced
>by grouping expressions or aggregate functions is nullable by an outer
>join above the relation to which we want to apply the partiall
>aggregation. Thanks to Tom's outer-join-aware-Var infrastructure, we
>can easily identify such situations and subsequently refrain from
>pushing down the aggregates.
>outer join, the aggregate cannot be pushed down if any column referenced
>by grouping expressions or aggregate functions is nullable by an outer
>join above the relation to which we want to apply the partiall
>aggregation. Thanks to Tom's outer-join-aware-Var infrastructure, we
>can easily identify such situations and subsequently refrain from
>pushing down the aggregates.
...and this related comment in eager_aggregate.out:
>-- Ensure aggregation cannot be pushed down to the nullable side
While I'm new to this work and its subtleties, I'm wondering if this is too broad a condition.
I modified the first test query in eager_aggregate.sql to make it a LEFT JOIN and eager aggregation indeed did not happen, which is expected based on the comments upthread.
query:
SET enable_eager_aggregate=ON;
EXPLAIN (VERBOSE, COSTS OFF)
SELECT t1.a, sum(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
plan:
QUERY PLAN
------------------------------------------------------------
GroupAggregate
Output: t1.a, sum(t2.c)
Group Key: t1.a
-> Sort
Output: t1.a, t2.c
Sort Key: t1.a
-> Hash Right Join
Output: t1.a, t2.c
Hash Cond: (t2.b = t1.b)
-> Seq Scan on public.eager_agg_t2 t2
Output: t2.a, t2.b, t2.c
-> Hash
Output: t1.a, t1.b
-> Seq Scan on public.eager_agg_t1 t1
Output: t1.a, t1.b
(15 rows)
(NOTE: I changed the aggregate from avg(...) to sum(...) for simplicity)
But, it seems that eager aggregation for the query above can be "replicated" as:
query:
EXPLAIN (VERBOSE, COSTS OFF)
SELECT t1.a, sum(t2.c)
FROM eager_agg_t1 t1
LEFT JOIN (
SELECT b, sum(c) c
FROM eager_agg_t2 t2p
GROUP BY b
) t2 ON t1.b = t2.b
GROUP BY t1.a
ORDER BY t1.a;
The output of both the original query and this one match (and the plans with eager aggregation and the subquery are nearly identical if you restore the LEFT JOIN to a JOIN). I admittedly may be missing a subtlety, but does this mean that there are conditions under which eager aggregation can be pushed down to the nullable side?
>-- Ensure aggregation cannot be pushed down to the nullable side
While I'm new to this work and its subtleties, I'm wondering if this is too broad a condition.
I modified the first test query in eager_aggregate.sql to make it a LEFT JOIN and eager aggregation indeed did not happen, which is expected based on the comments upthread.
query:
SET enable_eager_aggregate=ON;
EXPLAIN (VERBOSE, COSTS OFF)
SELECT t1.a, sum(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
plan:
QUERY PLAN
------------------------------------------------------------
GroupAggregate
Output: t1.a, sum(t2.c)
Group Key: t1.a
-> Sort
Output: t1.a, t2.c
Sort Key: t1.a
-> Hash Right Join
Output: t1.a, t2.c
Hash Cond: (t2.b = t1.b)
-> Seq Scan on public.eager_agg_t2 t2
Output: t2.a, t2.b, t2.c
-> Hash
Output: t1.a, t1.b
-> Seq Scan on public.eager_agg_t1 t1
Output: t1.a, t1.b
(15 rows)
(NOTE: I changed the aggregate from avg(...) to sum(...) for simplicity)
But, it seems that eager aggregation for the query above can be "replicated" as:
query:
EXPLAIN (VERBOSE, COSTS OFF)
SELECT t1.a, sum(t2.c)
FROM eager_agg_t1 t1
LEFT JOIN (
SELECT b, sum(c) c
FROM eager_agg_t2 t2p
GROUP BY b
) t2 ON t1.b = t2.b
GROUP BY t1.a
ORDER BY t1.a;
The output of both the original query and this one match (and the plans with eager aggregation and the subquery are nearly identical if you restore the LEFT JOIN to a JOIN). I admittedly may be missing a subtlety, but does this mean that there are conditions under which eager aggregation can be pushed down to the nullable side?
-Paul-
On Sat, Jul 6, 2024 at 4:56 PM Richard Guo <guofenglinux@gmail.com> wrote:
On Thu, Jun 13, 2024 at 4:07 PM Richard Guo <guofenglinux@gmail.com> wrote:
> I spent some time testing this patchset and found a few more issues.
> ...
> Hence here is the v8 patchset, with fixes for all the above issues.
I found an 'ORDER/GROUP BY expression not found in targetlist' error
with this patchset, with the query below:
create table t (a boolean);
set enable_eager_aggregate to on;
explain (costs off)
select min(1) from t t1 left join t t2 on t1.a group by (not (not
t1.a)), t1.a order by t1.a;
ERROR: ORDER/GROUP BY expression not found in targetlist
This happens because the two grouping items are actually the same and
standard_qp_callback would remove one of them. The fully-processed
groupClause is kept in root->processed_groupClause. However, when
collecting grouping expressions in create_grouping_expr_infos, we are
checking parse->groupClause, which is incorrect.
The fix is straightforward: check root->processed_groupClause instead.
Here is a new rebase with this fix.
Thanks
Richard
pgsql-hackers by date: