Re: Eager aggregation, take 3 - Mailing list pgsql-hackers

From Richard Guo
Subject Re: Eager aggregation, take 3
Date
Msg-id CAMbWs48sHEbQYZ5PPQdJKH6Vi4Hr-XYXkC6EObFhQORMdZNk9w@mail.gmail.com
Whole thread Raw
In response to Re: Eager aggregation, take 3  ("Matheus Alcantara" <matheusssilv97@gmail.com>)
List pgsql-hackers
On Fri, Aug 15, 2025 at 4:22 AM Matheus Alcantara
<matheusssilv97@gmail.com> wrote:
> Debugging this query shows that all if conditions on
> setup_eager_aggregation() returns false and create_agg_clause_infos()
> and create_grouping_expr_infos() are called. The RelAggInfo->agg_useful
> is also being set to true so I would expect to see Finalize and Partial
> agg nodes, is this correct or am I missing something here?

Well, just because eager aggregation *can* be applied does not mean
that it *will* be; it depends on whether it produces a lower-cost
execution plan.  This transformation is cost-based, so it's not the
right mindset to assume that it will always be applied when possible.

In your case, with the filter "t2.c = 5", the row estimate for t2 is
just 1 after the filter has been applied.  The planner decides that
adding a partial aggregation on top of such a small result set doesn't
offer much benefit, which seems reasonable to me.

->  Hash  (cost=18.50..18.50 rows=1 width=12)
          (actual time=0.864..0.865 rows=1.00 loops=1)
      Buckets: 1024  Batches: 1  Memory Usage: 9kB
      ->  Seq Scan on eager_agg_t2 t2  (cost=0.00..18.50 rows=1 width=12)
                                       (actual time=0.060..0.851
rows=1.00 loops=1)
            Filter: (c = '5'::double precision)
            Rows Removed by Filter: 999


With the filter "t2.c > 5", the row estimate for t2 is 995 after
filtering.  A partial aggregation can reduce that to 10 rows, so the
planner decides that adding a partial aggregation is beneficial -- and
does so.  That also seems reasonable to me.

->  Partial HashAggregate  (cost=23.48..23.58 rows=10 width=36)
                           (actual time=2.427..2.438 rows=10.00 loops=1)
      Group Key: t2.b
      Batches: 1  Memory Usage: 32kB
      ->  Seq Scan on eager_agg_t2 t2  (cost=0.00..18.50 rows=995 width=12)
                                       (actual time=0.053..0.989
rows=995.00 loops=1)
            Filter: (c > '5'::double precision)
            Rows Removed by Filter: 5

> Is this behavior correct? If it's correct, would be possible to check
> this limitation on setup_eager_aggregation() and maybe skip all the
> other work?

Hmm, I wouldn't consider this a limitation; it's just the result of
the planner's cost-based tournament for path selection.

Thanks
Richard



pgsql-hackers by date:

Previous
From: Peter Smith
Date:
Subject: Re: Skipping schema changes in publication
Next
From: Chao Li
Date:
Subject: Re: Make pgoutput documentation easier to find