Re: Using each rel as both outer and inner for JOIN_ANTI - Mailing list pgsql-hackers

From Alvaro Herrera
Subject Re: Using each rel as both outer and inner for JOIN_ANTI
Date
Msg-id 20220809105427.ym3ehoop7dnjame6@alvherre.pgsql
Whole thread Raw
In response to Re: Using each rel as both outer and inner for JOIN_ANTI  (Richard Guo <guofenglinux@gmail.com>)
Responses Re: Using each rel as both outer and inner for JOIN_ANTI
List pgsql-hackers
Just for kicks, I ran query in your original post under EXPLAIN ANALYZE
in both patched and unpatched with this last version.  I got this (best
of three):

Unpatched:
55432 16devel 437532=# explain (analyze, buffers) select * from foo left join bar on foo.a = bar.c where bar.c is
null;
                                                         QUERY PLAN
   
 

────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 Hash Anti Join  (cost=159039.00..183457.23 rows=10 width=20) (actual time=482.788..483.182 rows=10 loops=1)
   Hash Cond: (foo.a = bar.c)
   Buffers: shared hit=161 read=21964, temp read=8 written=8
   ->  Seq Scan on foo  (cost=0.00..1.10 rows=10 width=8) (actual time=0.020..0.022 rows=10 loops=1)
         Buffers: shared hit=1
   ->  Hash  (cost=72124.00..72124.00 rows=5000000 width=12) (actual time=482.128..482.129 rows=0 loops=1)
         Buckets: 262144  Batches: 64  Memory Usage: 2048kB
         Buffers: shared hit=160 read=21964
         ->  Seq Scan on bar  (cost=0.00..72124.00 rows=5000000 width=12) (actual time=0.092..237.431 rows=5000000
loops=1)
               Buffers: shared hit=160 read=21964
 Planning Time: 0.182 ms
 Execution Time: 483.248 ms


Patched:
                                                      QUERY PLAN                                                      
──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 Hash Right Anti Join  (cost=1.23..90875.24 rows=10 width=20) (actual time=457.654..457.658 rows=10 loops=1)
   Hash Cond: (bar.c = foo.a)
   Buffers: shared hit=33 read=22092
   ->  Seq Scan on bar  (cost=0.00..72124.00 rows=5000000 width=12) (actual time=0.020..229.097 rows=5000000 loops=1)
         Buffers: shared hit=32 read=22092
   ->  Hash  (cost=1.10..1.10 rows=10 width=8) (actual time=0.011..0.012 rows=10 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 9kB
         Buffers: shared hit=1
         ->  Seq Scan on foo  (cost=0.00..1.10 rows=10 width=8) (actual time=0.006..0.007 rows=10 loops=1)
               Buffers: shared hit=1
 Planning Time: 0.067 ms
 Execution Time: 457.679 ms



I suppose this looks good as far as the plan goes, but the cost estimation
might be a little bit too optimistic: it is reporting that the new plan
costs 50% of the original, yet the execution time is only 5% lower.

I wonder where does time go (in unpatched) when seqscanning finishes
and before hashing starts.

(I had to disable JIT for the first one, as it insisted on JITting tuple
deforming.)

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
Bob [Floyd] used to say that he was planning to get a Ph.D. by the "green
stamp method," namely by saving envelopes addressed to him as 'Dr. Floyd'.
After collecting 500 such letters, he mused, a university somewhere in
Arizona would probably grant him a degree.              (Don Knuth)



pgsql-hackers by date:

Previous
From: "Drouvot, Bertrand"
Date:
Subject: Re: [PATCH] Expose port->authn_id to extensions and triggers
Next
From: r.zharkov@postgrespro.ru
Date:
Subject: Re: Checking pgwin32_is_junction() errors