Home > mailing lists

Re: Performance of Query 4 on TPC-DS Benchmark - Mailing list pgsql-performance

From	Andrei Lepikhov
Subject	Re: Performance of Query 4 on TPC-DS Benchmark
Date	November 11 15:41:01
Msg-id	bdb4da6e-4432-4cf3-8ec9-622e5fc64d83@gmail.com Whole thread Raw
In response to	Performance of Query 4 on TPC-DS Benchmark (Ba Jinsheng <bajinsheng@u.nus.edu>)
Responses	Re: Performance of Query 4 on TPC-DS Benchmark
List	pgsql-performance

Tree view

On 11/11/24 02:35, Ba Jinsheng wrote:
> Hi all,
> 
> Please see this case:
> 
> 
> Query 4 on TPC-DS benchmark:
Thank you for interesting example!
Looking into explains I see two sortings:
->  Sort  (cost=794037.94..794037.95 rows=1 width=132)
    (actual time=3024403.310..3024403.313 rows=8 loops=1)
->  Sort  (cost=794033.93..794033.94 rows=1 width=132)
    (actual time=8068.869..8068.872 rows=8 loops=1)

Almost the same cost and different execution time. So, I think, the core 
of the problem in accuracy of selectivity estimation.
In this specific example I see lots of composite scan filters:
- ((sale_type = 'w'::text) AND (dyear = 2002))
- ((year_total > '0'::numeric) AND (sale_type = 'w'::text) AND (dyear = 
2001))
- ((year_total > '0'::numeric) AND (sale_type = 's'::text) AND (dyear = 
2001))

It is all the time a challenge for PostgreSQL to estimate such a filter 
because of absent information on joint column distribution.
Can you research this way by building extended statistics on these 
clauses? It could move the plan to the more optimal direction.

-- 
regards, Andrei Lepikhov

pgsql-performance by date:

From: Alena Rybakina
Date: 11 November, 02:18:47
Subject: Re: Performance of Query 4 on TPC-DS Benchmark

From: Ba Jinsheng
Date: 11 November, 16:49:56
Subject: Re: Performance of Query 4 on TPC-DS Benchmark

Re: Performance of Query 4 on TPC-DS Benchmark - Mailing list pgsql-performance

Previous

Next