Home > mailing lists

Re: Performance Issue on Query 18 of TPC-H Benchmark - Mailing list pgsql-bugs

From	Andrei Lepikhov
Subject	Re: Performance Issue on Query 18 of TPC-H Benchmark
Date	October 17 08:51:12
Msg-id	98398f84-6079-47f9-958c-c7e323228cc7@gmail.com Whole thread Raw
In response to	Re: Performance Issue on Query 18 of TPC-H Benchmark (Ba Jinsheng <bajinsheng@u.nus.edu>)
Responses	Re: Performance Issue on Query 18 of TPC-H Benchmark
List	pgsql-bugs

Tree view

On 10/17/24 01:26, Ba Jinsheng wrote:
>  >I believe you are not allowing the optimizer to generate a different 
> aggregation path (Group Aggregate) because it requires a sort operation. 
> So I think this is not correct.
> 
> Yes, this is what I did. I though it is what you were asking? I have not 
> found another way to enforce HashAggregate, so I directly modified the 
> code. Can you eliberate why it is incorrect?
As I see, the main problem lies in the first aggregate (with HAVING 
clause) where hash aggregation seems preferable. To disable that, you 
can use some trick:
SET enable_hashagg = 'on';
SET enable_sort = 'off';
SET work_mem = '1GB';

In my case the optimiser have built aggregation:
->  HashAggregate  (cost=202639.35..208346.45 rows=126825 width=4)
     (actual time=3540.761..4224.547 rows=9 loops=3)
     Group Key: lineitem_1.l_orderkey
       Filter: (sum(lineitem_1.l_quantity) > '314'::numeric)
       Batches: 1  Memory Usage: 638993kB
       Rows Removed by Filter: 1499991
       Worker 0:  Batches: 1  Memory Usage: 638993kB
       Worker 1:  Batches: 1  Memory Usage: 638993kB
       ->  Seq Scan on lineitem lineitem_1
       (cost=0.00..172626.23 rows=6002623 width=9)
       (actual time=0.014..675.552 rows=6001215 loops=3)

Not sure it is the best plan possible in this case. I know only about 
re-optimisation feature which can provide correct number of groups 
estimation to aggregates as well as cardinality and work_mem and give 
the optimiser all correct estimations. But it is still an enterprise 
feature :(.

> Can I understand disabling parallelism is a good setup for finding 
> performance issues?
I usually use such such a switch to identify problems. Parallel workers 
use HashJoin more frequently and it sometimes cause non-linear behaviour 
of execution time, compared to sequental plan. At the same time, they 
donn't support parameterised paths and it may end up in interesting 
performance jumps ...

BTW, I also wonder why do you report to pgsql-bugs in presence of better 
fitted pgsql-performance thread?

-- 
regards, Andrei Lepikhov

pgsql-bugs by date:

From: David Rowley
Date: 17 October, 08:14:35
Subject: Re: Performance Issue on Query 18 of TPC-H Benchmark

From: Amit Langote
Date: 17 October, 09:52:09
Subject: Re: BUG #18657: Using JSON_OBJECTAGG with volatile function leads to segfault

Re: Performance Issue on Query 18 of TPC-H Benchmark - Mailing list pgsql-bugs

Previous

Next