Re: Eager aggregation, take 3 - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Eager aggregation, take 3
Date
Msg-id CA+TgmoaycfwRGtVpzmgUbEibP5zyuNRZoJ4Sc0475tRW4rofoQ@mail.gmail.com
Whole thread Raw
In response to Re: Eager aggregation, take 3  (Richard Guo <guofenglinux@gmail.com>)
List pgsql-hackers
On Wed, Aug 6, 2025 at 3:52 AM Richard Guo <guofenglinux@gmail.com> wrote:
> Looking at TPC-DS queries 4 and 11, a threshold of 10 is the minimum
> needed to consider eager aggregation for them.  The resulting plans
> show nice performance improvements without any measurable increase in
> planning time.  So, I'm inclined to lower the threshold to 10 for now.
> (Wondering whether we should make this threshold a GUC, so users can
> adjust it based on their needs.)

Like Matheus, I think a GUC is reasonable. A significant danger here
appears to be the possibility of a performance cliff, where queries
are optimized very different when the ratio is 9.99 vs. 10.01, say. It
would be nice if there were some way to mitigate that danger, but at
least a GUC avoids chaining the performance of the whole system to a
hard-coded value.

It might be worth considering whether there are heuristics other than
the group size that could help here. Possibly that's just making
things more complicated to no benefit. It seems to me, for example,
that reducing 100 rows to 10 is quite different from reducing a
million rows to 100,000. On the whole, the latter seems more likely to
work out well, but it's tricky, because the effort expended per group
can be arbitrarily high. I think we do want to let the cost model make
most of the decisions, and just use this threshold to prune ideas that
are obviously bad at an early stage. That said, it's worth thinking
about how this interacts with the just-considered-one-eager-agg
strategy. Does this threshold apply before or after that rule?

For instance, consider AGG(FACT_TABLE JOIN DIMENSION_TABLE), like a
count of orders grouped by customer name. Aggregating on the dimension
table (in this case, the list of customers) is probably useless, but
aggregating on the join column of the fact table has a good chance of
being useful. If we consider only one of those strategies, we want it
to be the right one. This threshold could be the thing that helps us
to get it right.

--
Robert Haas
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: "Burd, Greg"
Date:
Subject: Re: [PATCH] Add tests for Bitmapset
Next
From: Jacob Champion
Date:
Subject: Re: Differential Code Coverage report for Postgres