Re: Using each rel as both outer and inner for JOIN_ANTI - Mailing list pgsql-hackers

From Richard Guo
Subject Re: Using each rel as both outer and inner for JOIN_ANTI
Date
Msg-id CAMbWs49mh1Fsnw1Zu5MzOzhHsh_Gcrz0egVk0MGAOq16ezXZPw@mail.gmail.com
Whole thread Raw
In response to Re: Using each rel as both outer and inner for JOIN_ANTI  (Alvaro Herrera <alvherre@alvh.no-ip.org>)
Responses Re: Using each rel as both outer and inner for JOIN_ANTI
List pgsql-hackers

On Wed, Aug 10, 2022 at 4:40 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
On 2022-Aug-10, Richard Guo wrote:

> The right-anti join plan has the same cost estimation with right join
> plan in this case. So would you please help to test what the right join
> plan looks like in your env for the query below?
>
>  select * from foo left join bar on foo.a = bar.c;

You're right, it does.

55432 16devel 475322=# explain (analyze, buffers)  select * from foo left join bar on foo.a = bar.c;
                                                      QUERY PLAN                                                     
──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 Hash Right Join  (cost=1.23..90875.24 rows=10 width=20) (actual time=456.410..456.415 rows=10 loops=1)
   Hash Cond: (bar.c = foo.a)
   Buffers: shared hit=15852 read=6273
   ->  Seq Scan on bar  (cost=0.00..72124.00 rows=5000000 width=12) (actual time=0.036..210.468 rows=5000000 loops=1)
         Buffers: shared hit=15852 read=6272
   ->  Hash  (cost=1.10..1.10 rows=10 width=8) (actual time=0.037..0.038 rows=10 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 9kB
         Buffers: shared read=1
         ->  Seq Scan on foo  (cost=0.00..1.10 rows=10 width=8) (actual time=0.022..0.026 rows=10 loops=1)
               Buffers: shared read=1
 Planning:
   Buffers: shared hit=92 read=13
 Planning Time: 1.077 ms
 Execution Time: 456.458 ms
(14 filas)

Thanks for help testing. Comparing the anti join plan and the right join
plan, the estimated cost and the execution time mismatch a lot. Seems
the cost estimate of hashjoin path is not that precise for this case
even in the unpatched codes. Maybe this is something we need to improve.

Thanks
Richard

pgsql-hackers by date:

Previous
From: John Naylor
Date:
Subject: Re: optimize lookups in snapshot [sub]xip arrays
Next
From: John Naylor
Date:
Subject: Re: [RFC] building postgres with meson