Re: Using each rel as both outer and inner for JOIN_ANTI - Mailing list pgsql-hackers

From Richard Guo
Subject Re: Using each rel as both outer and inner for JOIN_ANTI
Date
Msg-id CAMbWs492b4h2-zHyFJ9EGu4bgmOUaupeUpz=7yubyip=mZjvTA@mail.gmail.com
Whole thread Raw
In response to Re: Using each rel as both outer and inner for JOIN_ANTI  (Alvaro Herrera <alvherre@alvh.no-ip.org>)
Responses Re: Using each rel as both outer and inner for JOIN_ANTI
List pgsql-hackers

On Tue, Aug 9, 2022 at 6:54 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
I suppose this looks good as far as the plan goes, but the cost estimation
might be a little bit too optimistic: it is reporting that the new plan
costs 50% of the original, yet the execution time is only 5% lower.

Thanks for trying this patch. Yeah, the estimated cost doesn't match the
execution time here. I tried the query locally and here is what I got
(best of three):

Unpatched:
# explain analyze select * from foo left join bar on foo.a = bar.c where bar.c is null;
                                                        QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
 Hash Anti Join  (cost=154156.00..173691.19 rows=1 width=16) (actual time=1548.622..1548.624 rows=0 loops=1)
   Hash Cond: (foo.a = bar.c)
   ->  Seq Scan on foo  (cost=0.00..1.10 rows=10 width=8) (actual time=0.024..0.026 rows=10 loops=1)
   ->  Hash  (cost=72124.00..72124.00 rows=5000000 width=8) (actual time=1443.157..1443.158 rows=5000000 loops=1)
         Buckets: 262144  Batches: 64  Memory Usage: 5107kB
         ->  Seq Scan on bar  (cost=0.00..72124.00 rows=5000000 width=8) (actual time=0.045..481.059 rows=5000000 loops=1)
 Planning Time: 0.262 ms
 Execution Time: 1549.138 ms
(8 rows)

Patched:
# explain analyze select * from foo left join bar on foo.a = bar.c where bar.c is null;
                                                     QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
 Hash Right Anti Join  (cost=1.23..90875.33 rows=1 width=16) (actual time=985.773..985.775 rows=0 loops=1)
   Hash Cond: (bar.c = foo.a)
   ->  Seq Scan on bar  (cost=0.00..72124.00 rows=5000000 width=8) (actual time=0.095..438.333 rows=5000000 loops=1)
   ->  Hash  (cost=1.10..1.10 rows=10 width=8) (actual time=0.076..0.077 rows=10 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 9kB
         ->  Seq Scan on foo  (cost=0.00..1.10 rows=10 width=8) (actual time=0.060..0.064 rows=10 loops=1)
 Planning Time: 0.290 ms
 Execution Time: 985.830 ms
(8 rows)

Seems the cost matches the execution time better in my local box.

The right-anti join plan has the same cost estimation with right join
plan in this case. So would you please help to test what the right join
plan looks like in your env for the query below?

 select * from foo left join bar on foo.a = bar.c;

Thanks
Richard

pgsql-hackers by date:

Previous
From: "Drouvot, Bertrand"
Date:
Subject: Fix a typo in pgstatfuncs.c
Next
From: Nazir Bilal Yavuz
Date:
Subject: Re: [RFC] building postgres with meson - v10