Re: Eager aggregation, take 3 - Mailing list pgsql-hackers

From Matheus Alcantara
Subject Re: Eager aggregation, take 3
Date
Msg-id DD8YF2IZ2M1C.1C0WAMOSIW59K@gmail.com
Whole thread Raw
In response to Re: Eager aggregation, take 3  (Richard Guo <guofenglinux@gmail.com>)
List pgsql-hackers
On Fri Oct 3, 2025 at 12:14 AM -03, Richard Guo wrote:
>> The only query that I see a considerable regression is query 23 which I
>> get a 23% worst execution time. I'm attaching the EXPLAIN(ANALYZE)
>> output from master and from the patched version if it's interesting.
>
> I tested query 23 in my local environment but didn't observe the
> regression.
>
> -- on master
>  Planning Time: 1.950 ms
>  Execution Time: 3260.924 ms
>
> -- on patched
>  Planning Time: 2.197 ms
>  Execution Time: 3237.287 ms
>
> I ran the benchmark at scale factor 1 and executed ANALYZE beforehand.
> For the build configuration, I disabled cassert.
>
I've disabled the cassert and executed the ANALYZE again before
benchmarking and now I have similar results with a improvement on eager
aggregate version:

-- master
Planning Time: 2.734 ms
Execution Time: 5238.128 ms

-- patched
Planning Time: 2.578 ms
Execution Time: 4732.584 ms

> Comparing the plans, I noticed one key difference: in the plan you
> provided (query-23.patch.explain), the frequent_ss_items CTE uses
> parallel aggregation, whereas in my local environment it does not.
> This leads to a different final join order between the two plans.
>
> However, given the highly inaccurate size and cost estimates for the
> CTE Scan nodes, I'm not sure it's worth investigating further.  I'm
> starting to feel that trying to tune performance here, with such
> inaccurate underlying estimates for CTEs, is like building on sand.
>
> [ ...]
>
>> I'm just wondering if there is anything that can be done on the planner
>> to prevent this type of situation?
>
> I think the ideal solution is to improve our estimates for CTE
> relations to make the plans for TPC-DS queries more reasonable.  Of
> course, for queries from other benchmarks, the issues may stem from
> other plan nodes.  IMHO, we really need some improvements in our cost
> estimation.
>
Fair points, agree.

The performance results look good to me. I don't have to much comments
about the code although I'm still learning about the planner internals
this patch seems in good shape to me.

I'm just attaching a new csv with the last results after running with
cassert disabled and after executing ANALYZE. It looks good to me.

Thanks for working on this!

--
Matheus Alcantara

Attachment

pgsql-hackers by date:

Previous
From: Nathan Bossart
Date:
Subject: Re: a couple of small patches for simd.h
Next
From: Shubham Khanna
Date:
Subject: Re: Add support for specifying tables in pg_createsubscriber.