Re: Performance Issue on Query 18 of TPC-H Benchmark - Mailing list pgsql-bugs

From David Rowley
Subject Re: Performance Issue on Query 18 of TPC-H Benchmark
Date
Msg-id CAApHDvpbwaYyOwvUmNfaxpejDE2MO6C7=RsScrHv0m-=sm5d7A@mail.gmail.com
Whole thread Raw
In response to Performance Issue on Query 18 of TPC-H Benchmark  (Ba Jinsheng <bajinsheng@u.nus.edu>)
List pgsql-bugs
On Wed, 16 Oct 2024 at 07:29, Ba Jinsheng <bajinsheng@u.nus.edu> wrote:
> While I found that if we disable the path sorting here:
> diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
> index 0c7273b9cc..91320f473a 100644
> --- a/src/backend/optimizer/plan/planner.c
> +++ b/src/backend/optimizer/plan/planner.c
> @@ -6928,7 +6928,7 @@ make_ordered_path(PlannerInfo *root, RelOptInfo *rel, Path *path,
>                                                                                         path->pathkeys,
>                                                                                         &presorted_keys);
>
> -       if (!is_sorted)
> +       if (false)
>         {
>                 /*
>                  * Try at least sorting the cheapest path and also try incrementally
>
> Both the performance and estimated cost are reduced around 40%:

>                      ->  Partial GroupAggregate  (cost=214938.45..317625.48 rows=211468 width=71) (actual
time=1616.827..1648.580rows=3 loops=3)
 
>                            Group Key: customer.c_custkey, orders.o_orderkey
>                            ->  Nested Loop  (cost=214938.45..313396.12 rows=211468 width=44) (actual
time=1609.796..1648.571rows=21 loops=3)
 
>                                  ->  Parallel Hash Join  (cost=214938.02..249103.36 rows=52844 width=43) (actual
time=1609.777..1648.532rows=3 loops=3)
 
>                                        Hash Cond: (orders.o_custkey = customer.c_custkey)
>                                        ->  Hash Join  (cost=209931.77..243958.39 rows=52844 width=24) (actual
time=1573.950..1612.634rows=3 loops=3)
 
>                                              Hash Cond: (orders.o_orderkey = lineitem_1.l_orderkey)
>                                              ->  Parallel Seq Scan on orders  (cost=0.00..32386.00 rows=625000
width=20)(actual time=0.016..32.211 rows=500000 loops=3)
 

GroupAggreate needs sorted input, so what you've done isn't valid.

David



pgsql-bugs by date:

Previous
From: Ba Jinsheng
Date:
Subject: Performance Issue on Query 18 of TPC-H Benchmark
Next
From: Alena Rybakina
Date:
Subject: Re: Performance Issue on Query 18 of TPC-H Benchmark