On 10/17/24 01:26, Ba Jinsheng wrote:
> >I believe you are not allowing the optimizer to generate a different
> aggregation path (Group Aggregate) because it requires a sort operation.
> So I think this is not correct.
>
> Yes, this is what I did. I though it is what you were asking? I have not
> found another way to enforce HashAggregate, so I directly modified the
> code. Can you eliberate why it is incorrect?
As I see, the main problem lies in the first aggregate (with HAVING
clause) where hash aggregation seems preferable. To disable that, you
can use some trick:
SET enable_hashagg = 'on';
SET enable_sort = 'off';
SET work_mem = '1GB';
In my case the optimiser have built aggregation:
-> HashAggregate (cost=202639.35..208346.45 rows=126825 width=4)
(actual time=3540.761..4224.547 rows=9 loops=3)
Group Key: lineitem_1.l_orderkey
Filter: (sum(lineitem_1.l_quantity) > '314'::numeric)
Batches: 1 Memory Usage: 638993kB
Rows Removed by Filter: 1499991
Worker 0: Batches: 1 Memory Usage: 638993kB
Worker 1: Batches: 1 Memory Usage: 638993kB
-> Seq Scan on lineitem lineitem_1
(cost=0.00..172626.23 rows=6002623 width=9)
(actual time=0.014..675.552 rows=6001215 loops=3)
Not sure it is the best plan possible in this case. I know only about
re-optimisation feature which can provide correct number of groups
estimation to aggregates as well as cardinality and work_mem and give
the optimiser all correct estimations. But it is still an enterprise
feature :(.
> Can I understand disabling parallelism is a good setup for finding
> performance issues?
I usually use such such a switch to identify problems. Parallel workers
use HashJoin more frequently and it sometimes cause non-linear behaviour
of execution time, compared to sequental plan. At the same time, they
donn't support parameterised paths and it may end up in interesting
performance jumps ...
BTW, I also wonder why do you report to pgsql-bugs in presence of better
fitted pgsql-performance thread?
--
regards, Andrei Lepikhov