Re: Eager aggregation, take 3 - Mailing list pgsql-hackers
From | Matheus Alcantara |
---|---|
Subject | Re: Eager aggregation, take 3 |
Date | |
Msg-id | DC2E8YU1D3M8.3M8S05QFY4XAR@gmail.com Whole thread Raw |
In response to | Re: Eager aggregation, take 3 (Richard Guo <guofenglinux@gmail.com>) |
Responses |
Re: Eager aggregation, take 3
|
List | pgsql-hackers |
On 08/08/25 22:32, Richard Guo wrote: >> It sounds like a good way to go for me, looking forward to the next >> patch version to perform some other tests. > > OK. Here it is. > Thanks! I can confirm now that I can see the eager aggregate in action in some of these queries that I've tested on the TPC-DS benchmark. I few questions regarding the new version: I've noticed that when a query has a WHERE clause filtering columns from the same relation being aggregated using "=" operator the Partial and Finalize aggregation nodes are not present on explain results even if setup_eager_aggregation() returns true on all if statements and also RelAggInfo->agg_useful is true. For example, consider this query that is used on eager aggregation paper that use some tables from TPC-H benchmark: tpch=# show enable_eager_aggregate ; enable_eager_aggregate ------------------------ on (1 row) tpch=# set max_parallel_workers_per_gather to 0; SET tpch=# EXPLAIN(COSTS OFF) SELECT O_CLERK, SUM(L_EXTENDEDPRICE * (1 - L_DISCOUNT)) AS LOSS FROM LINEITEM JOIN ORDERS ON L_ORDERKEY = O_ORDERKEY WHERE L_RETURNFLAG = 'R' GROUP BY O_CLERK; QUERY PLAN -------------------------------------------------------------- HashAggregate Group Key: orders.o_clerk -> Hash Join Hash Cond: (lineitem.l_orderkey = orders.o_orderkey) -> Seq Scan on lineitem Filter: (l_returnflag = 'R'::bpchar) -> Hash -> Seq Scan on orders (8 rows) Debugging this query shows that all if conditions on setup_eager_aggregation() returns false and create_agg_clause_infos() and create_grouping_expr_infos() are called. The RelAggInfo->agg_useful is also being set to true so I would expect to see Finalize and Partial agg nodes, is this correct or am I missing something here? Removing the WHERE clause I can see the Finalize and Partial agg nodes: tpch=# EXPLAIN(COSTS OFF) SELECT O_CLERK, SUM(L_EXTENDEDPRICE * (1 - L_DISCOUNT)) AS LOSS FROM LINEITEM JOIN ORDERS ON L_ORDERKEY = O_ORDERKEY GROUP BY O_CLERK; QUERY PLAN ---------------------------------------------------------------------- Finalize HashAggregate Group Key: orders.o_clerk -> Merge Join Merge Cond: (lineitem.l_orderkey = orders.o_orderkey) -> Partial GroupAggregate Group Key: lineitem.l_orderkey -> Index Scan using idx_lineitem_orderkey on lineitem -> Index Scan using orders_pkey on orders (8 rows) This can also be reproduced with an addition of a WHERE clause on some tests on eager_aggregate.sql: postgres=# EXPLAIN (VERBOSE, COSTS OFF) SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b WHERE t2.c = 5 GROUP BY t1.a ORDER BY t1.a; QUERY PLAN ------------------------------------------------------------------ GroupAggregate Output: t1.a, avg(t2.c) Group Key: t1.a -> Sort Output: t1.a, t2.c Sort Key: t1.a -> Hash Join Output: t1.a, t2.c Hash Cond: (t1.b = t2.b) -> Seq Scan on public.eager_agg_t1 t1 Output: t1.a, t1.b, t1.c -> Hash Output: t2.c, t2.b -> Seq Scan on public.eager_agg_t2 t2 Output: t2.c, t2.b Filter: (t2.c = '5'::double precision) (16 rows) Note that if I use ">" operator for example, this doesn't happen: SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b WHERE t2.c > 5 GROUP BY t1.a ORDER BY t1.a; QUERY PLAN ------------------------------------------------------------------------ Finalize GroupAggregate Output: t1.a, avg(t2.c) Group Key: t1.a -> Sort Output: t1.a, (PARTIAL avg(t2.c)) Sort Key: t1.a -> Hash Join Output: t1.a, (PARTIAL avg(t2.c)) Hash Cond: (t1.b = t2.b) -> Seq Scan on public.eager_agg_t1 t1 Output: t1.a, t1.b, t1.c -> Hash Output: t2.b, (PARTIAL avg(t2.c)) -> Partial HashAggregate Output: t2.b, PARTIAL avg(t2.c) Group Key: t2.b -> Seq Scan on public.eager_agg_t2 t2 Output: t2.a, t2.b, t2.c Filter: (t2.c > '5'::double precision) (19 rows) Is this behavior correct? If it's correct, would be possible to check this limitation on setup_eager_aggregation() and maybe skip all the other work? -- Matheus Alcantara
pgsql-hackers by date: