Re: Eager aggregation, take 3 - Mailing list pgsql-hackers

From Paul George
Subject Re: Eager aggregation, take 3
Date
Msg-id CALA8mJquG_zCJXfVwash5LKqHGtZXQmq7RfTSaRDUzGYeW=7Rw@mail.gmail.com
Whole thread Raw
In response to Re: Eager aggregation, take 3  (Richard Guo <guofenglinux@gmail.com>)
Responses Re: Eager aggregation, take 3
List pgsql-hackers
Richard:

Thanks for reviving this patch and for all of your work on it! Eager aggregation pushdown will be beneficial for my work and I'm hoping to see it land.


I was playing around with v9 of the patches and was specifically curious about this previous statement...

>This patch also makes eager aggregation work with outer joins.  With
>outer join, the aggregate cannot be pushed down if any column referenced
>by grouping expressions or aggregate functions is nullable by an outer
>join above the relation to which we want to apply the partiall
>aggregation.  Thanks to Tom's outer-join-aware-Var infrastructure, we
>can easily identify such situations and subsequently refrain from
>pushing down the aggregates.

 ...and this related comment in eager_aggregate.out:

>-- Ensure aggregation cannot be pushed down to the nullable side

While I'm new to this work and its subtleties, I'm wondering if this is too broad a condition.

I modified the first test query in eager_aggregate.sql to make it a LEFT JOIN and eager aggregation indeed did not happen, which is expected based on the comments upthread.

query:
SET enable_eager_aggregate=ON;
EXPLAIN (VERBOSE, COSTS OFF)
SELECT t1.a, sum(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;

plan:
                         QUERY PLAN                        
------------------------------------------------------------
 GroupAggregate
   Output: t1.a, sum(t2.c)
   Group Key: t1.a
   ->  Sort
         Output: t1.a, t2.c
         Sort Key: t1.a
         ->  Hash Right Join
               Output: t1.a, t2.c
               Hash Cond: (t2.b = t1.b)
               ->  Seq Scan on public.eager_agg_t2 t2
                     Output: t2.a, t2.b, t2.c
               ->  Hash
                     Output: t1.a, t1.b
                     ->  Seq Scan on public.eager_agg_t1 t1
                           Output: t1.a, t1.b
(15 rows)

(NOTE: I changed the aggregate from avg(...) to sum(...) for simplicity)

But, it seems that eager aggregation for the query above can be "replicated" as:

query:

EXPLAIN (VERBOSE, COSTS OFF)
SELECT t1.a, sum(t2.c)
FROM eager_agg_t1 t1
LEFT JOIN (
    SELECT b, sum(c) c
    FROM eager_agg_t2 t2p
    GROUP BY b
) t2 ON t1.b = t2.b
GROUP BY t1.a
ORDER BY t1.a;

The output of both the original query and this one match (and the plans with eager aggregation and the subquery are nearly identical if you restore the LEFT JOIN to a JOIN). I admittedly may be missing a subtlety, but does this mean that there are conditions under which eager aggregation can be pushed down to the nullable side?


-Paul-

On Sat, Jul 6, 2024 at 4:56 PM Richard Guo <guofenglinux@gmail.com> wrote:
On Thu, Jun 13, 2024 at 4:07 PM Richard Guo <guofenglinux@gmail.com> wrote:
> I spent some time testing this patchset and found a few more issues.
> ...

> Hence here is the v8 patchset, with fixes for all the above issues.

I found an 'ORDER/GROUP BY expression not found in targetlist' error
with this patchset, with the query below:

create table t (a boolean);

set enable_eager_aggregate to on;

explain (costs off)
select min(1) from t t1 left join t t2 on t1.a group by (not (not
t1.a)), t1.a order by t1.a;
ERROR:  ORDER/GROUP BY expression not found in targetlist

This happens because the two grouping items are actually the same and
standard_qp_callback would remove one of them.  The fully-processed
groupClause is kept in root->processed_groupClause.  However, when
collecting grouping expressions in create_grouping_expr_infos, we are
checking parse->groupClause, which is incorrect.

The fix is straightforward: check root->processed_groupClause instead.

Here is a new rebase with this fix.

Thanks
Richard

pgsql-hackers by date:

Previous
From: Joseph Koshakow
Date:
Subject: Re: Remove dependence on integer wrapping
Next
From: Andres Freund
Date:
Subject: tests fail on windows with default git settings