Home > mailing lists

Re: Performance of TPC-DS Query 95 - Mailing list pgsql-performance

From	David Rowley
Subject	Re: Performance of TPC-DS Query 95
Date	November 26, 2024 01:41:22
Msg-id	CAApHDvr=zusfekWdNGe8dh_C23mBMXjhz6ZYkuJN_vbkgND0vw@mail.gmail.com Whole thread
In response to	Performance of TPC-DS Query 95 (Ba Jinsheng <bajinsheng@u.nus.edu>)
List	pgsql-performance

Tree view

On Tue, 26 Nov 2024 at 09:55, Ba Jinsheng <bajinsheng@u.nus.edu> wrote:
> TPC-DS query 95:
> Its execution time is nearly 1 min:

>    CTE ws_wh
>      ->  Hash Join  (cost=37772.14..198810.77 rows=7242361 width=12) (actual time=211.161..1443.926 rows=6644004
loops=1)

> If applying this patch:

> -       if (is_innerrel_unique_for(root, joinrelids, outerrelids, innerrel,
> +       if (!is_innerrel_unique_for(root, joinrelids, outerrelids, innerrel,

> The execution time is reduced to 6 seconds:

>    CTE ws_wh
>      ->  Hash Join  (cost=37772.14..74062.53 rows=7095248 width=12) (actual time=203.407..560.264 rows=719205
loops=1)

> The difference between both query plans is the second one uses Materialize instead of Memoize. From the code, it
seemsthat changing the usage of the cache brings performance improvement unexpectedly.

What's going on here is that you've introduced a new bug which has the
Hash Join effectively perform a semi-join rather than an inner join so
that it only joins the first matching inner-side row instead of all
matching inner-side rows for each outer-side row. The bug is fairly
evident when to see that the Hash Join produces 6644004 rows without
your change and only 719205 rows after you added the bug.

I'm not sure what you expect us to do here, but just in case you're
not aware, we're not going to prioritise performance over correct
results.

I expect the problem you've introduced will be more evident if you use
EXPLAIN ANALYZE VERBOSE. The verbose option will show which joins are
marked "Inner Unique". If you want to learn more about the
optimisation you've broken, see [1].

David

[1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=9c7f5229ad68d7e0e4dd149e3f80257893e404d4

pgsql-performance by date:

From: Ba Jinsheng
Date: 26 November 2024, 00:02:46
Subject: CTE Inline On TPC-DS Query 95

From: David Rowley
Date: 26 November 2024, 07:03:24
Subject: Re: CTE Inline On TPC-DS Query 95

Re: Performance of TPC-DS Query 95 - Mailing list pgsql-performance

Previous

Next