Re: Pathify RHS unique-ification for semijoin planning - Mailing list pgsql-hackers

From Álvaro Herrera
Subject Re: Pathify RHS unique-ification for semijoin planning
Date
Msg-id 202507230811.ueghjzzloxkm@alvherre.pgsql
Whole thread Raw
In response to Re: Pathify RHS unique-ification for semijoin planning  (Richard Guo <guofenglinux@gmail.com>)
Responses Re: Pathify RHS unique-ification for semijoin planning
List pgsql-hackers
Hello,

As a very trivial test on this patch, I ran the query in your opening
email, both with and without the patch, scaling up the size of the table
a little bit.  So I did this

    drop table if exists t;
    create table t(a int, b int);
    insert into t select i % 100000, i from generate_series(1,1e7) i;
    create index on t(a);
    vacuum analyze t;

    set enable_hashagg to off;

    explain (costs off, analyze, buffers)
    select * from t t1 where t1.a in
      (select a from t t2 where a < 10000)
    order by t1.a;


This is the plan without the patch:

                                                    QUERY PLAN                                                     
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 Merge Join (actual time=289.262..700.761 rows=1000000.00 loops=1)
   Merge Cond: (t1.a = t2.a)
   Buffers: shared hit=1017728 read=3945 written=3361, temp read=1471 written=1476
   ->  Index Scan using t_a_idx on t t1 (actual time=0.011..320.747 rows=1000001.00 loops=1)
         Index Searches: 1
         Buffers: shared hit=997725 read=3112 written=2664
   ->  Sort (actual time=219.273..219.771 rows=10000.00 loops=1)
         Sort Key: t2.a
         Sort Method: quicksort  Memory: 385kB
         Buffers: shared hit=20003 read=833 written=697, temp read=1471 written=1476
         ->  Unique (actual time=128.173..218.708 rows=10000.00 loops=1)
               Buffers: shared hit=20003 read=833 written=697, temp read=1471 written=1476
               ->  Sort (actual time=128.170..185.461 rows=1000000.00 loops=1)
                     Sort Key: t2.a
                     Sort Method: external merge  Disk: 11768kB
                     Buffers: shared hit=20003 read=833 written=697, temp read=1471 written=1476
                     ->  Index Only Scan using t_a_idx on t t2 (actual time=0.024..74.171 rows=1000000.00 loops=1)
                           Index Cond: (a < 10000)
                           Heap Fetches: 0
                           Index Searches: 1
                           Buffers: shared hit=20003 read=833 written=697
 Planning:
   Buffers: shared hit=28 read=7
 Planning Time: 0.212 ms
 Execution Time: 732.840 ms


and this is the plan with the patch:

                                              QUERY PLAN                                               
───────────────────────────────────────────────────────────────────────────────────────────────────────
 Merge Join (actual time=70.310..595.116 rows=1000000.00 loops=1)
   Merge Cond: (t1.a = t2.a)
   Buffers: shared hit=1017750 read=3923 written=3586
   ->  Index Scan using t_a_idx on t t1 (actual time=0.020..341.257 rows=1000001.00 loops=1)
         Index Searches: 1
         Buffers: shared hit=996914 read=3923 written=3586
   ->  Unique (actual time=0.028..99.074 rows=10000.00 loops=1)
         Buffers: shared hit=20836
         ->  Index Only Scan using t_a_idx on t t2 (actual time=0.026..66.219 rows=1000000.00 loops=1)
               Index Cond: (a < 10000)
               Heap Fetches: 0
               Index Searches: 1
               Buffers: shared hit=20836
 Planning:
   Buffers: shared hit=55 read=15 written=14
 Planning Time: 0.391 ms
 Execution Time: 621.377 ms


This is a really nice improvement.  I think we could find queries that
are arbitrarily faster, by feeding enough tuples to the unnecessary Sort
nodes.

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"No necesitamos banderas
 No reconocemos fronteras"                  (Jorge González)



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Custom pgstat support performance regression for simple queries
Next
From: Álvaro Herrera
Date:
Subject: Re: Proposal: QUALIFY clause