Re: Performance of TPC-DS Query 95 - Mailing list pgsql-performance

From David Rowley
Subject Re: Performance of TPC-DS Query 95
Date
Msg-id CAApHDvr=zusfekWdNGe8dh_C23mBMXjhz6ZYkuJN_vbkgND0vw@mail.gmail.com
Whole thread Raw
In response to Performance of TPC-DS Query 95  (Ba Jinsheng <bajinsheng@u.nus.edu>)
List pgsql-performance
On Tue, 26 Nov 2024 at 09:55, Ba Jinsheng <bajinsheng@u.nus.edu> wrote:
> TPC-DS query 95:
> Its execution time is nearly 1 min:

>    CTE ws_wh
>      ->  Hash Join  (cost=37772.14..198810.77 rows=7242361 width=12) (actual time=211.161..1443.926 rows=6644004
loops=1)

> If applying this patch:

> -       if (is_innerrel_unique_for(root, joinrelids, outerrelids, innerrel,
> +       if (!is_innerrel_unique_for(root, joinrelids, outerrelids, innerrel,

> The execution time is reduced to 6 seconds:

>    CTE ws_wh
>      ->  Hash Join  (cost=37772.14..74062.53 rows=7095248 width=12) (actual time=203.407..560.264 rows=719205
loops=1)

> The difference between both query plans is the second one uses Materialize instead of Memoize. From the code, it
seemsthat changing the usage of the cache brings performance improvement unexpectedly.
 

What's going on here is that you've introduced a new bug which has the
Hash Join effectively perform a semi-join rather than an inner join so
that it only joins the first matching inner-side row instead of all
matching inner-side rows for each outer-side row. The bug is fairly
evident when to see that the Hash Join produces 6644004 rows without
your change and only 719205 rows after you added the bug.

I'm not sure what you expect us to do here, but just in case you're
not aware, we're not going to prioritise performance over correct
results.

I expect the problem you've introduced will be more evident if you use
EXPLAIN ANALYZE VERBOSE. The verbose option will show which joins are
marked "Inner Unique". If you want to learn more about the
optimisation you've broken, see [1].

David

[1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=9c7f5229ad68d7e0e4dd149e3f80257893e404d4



pgsql-performance by date:

Previous
From: Ba Jinsheng
Date:
Subject: CTE Inline On TPC-DS Query 95
Next
From: David Rowley
Date:
Subject: Re: CTE Inline On TPC-DS Query 95