Re: Exists pull-up application with JoinExpr - Mailing list pgsql-hackers

From Alena Rybakina
Subject Re: Exists pull-up application with JoinExpr
Date
Msg-id 46bc4eaf-58c5-42ab-8041-d3380a0768de@postgrespro.ru
Whole thread Raw
List pgsql-hackers
On 24.12.2024 13:25, Ranier Vilela wrote:
Hi Alena.

Em ter., 24 de dez. de 2024 às 01:44, Alena Rybakina <a.rybakina@postgrespro.ru> escreveu:

Hi, hackers!

I found one pull-up that works if the inner join condition is written through the where condition,

create temp table ta (id int primary key, val int);
insert into ta values(1,1);
insert into ta values(2,2);
insert into ta values(3,3);

create temp table tb (id int primary key, aval int);
insert into tb values(4,1);
insert into tb values(5,1);
insert into tb values(1,2);

create temp table tc (id int primary key, aid int);
insert into tc values(6,1);
insert into tc values(7,2);
EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF) SELECT *   FROM ta  WHERE EXISTS (SELECT *                  FROM tb, tc                  WHERE ta.id = tb.id);                               QUERY PLAN                                
------------------------------------------------------------------------- Nested Loop Semi Join (actual rows=1 loops=1)   Buffers: local hit=6   ->  Seq Scan on ta (actual rows=3 loops=1)         Buffers: local hit=1   ->  Nested Loop (actual rows=0 loops=3)         Buffers: local hit=5         ->  Index Only Scan using tb_pkey on tb (actual rows=0 loops=3)               Index Cond: (id = ta.id)               Heap Fetches: 1               Buffers: local hit=4         ->  Seq Scan on tc (actual rows=1 loops=1)               Buffers: local hit=1 Planning:   Buffers: shared hit=67 read=12
(14 rows)

but it doesn't work if it is written through the outside condition.

alena@postgres=# EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF) SELECT *   FROM ta  WHERE EXISTS (SELECT *                  FROM tb JOIN tc                  ON ta.id = tb.id);                      QUERY PLAN                      
------------------------------------------------------ Seq Scan on ta (actual rows=1 loops=1)   Filter: EXISTS(SubPlan 1)   Rows Removed by Filter: 2   Buffers: local hit=5   SubPlan 1     ->  Nested Loop (actual rows=0 loops=3)           Buffers: local hit=4           ->  Seq Scan on tb (actual rows=0 loops=3)                 Filter: (ta.id = id)                 Rows Removed by Filter: 3                 Buffers: local hit=3           ->  Seq Scan on tc (actual rows=1 loops=1)                 Buffers: local hit=1 Planning:   Buffers: shared hit=16 read=9
(15 rows)

I have written a patch to add this functionality and now it gives an query plan: alena@postgres=# EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF)
 SELECT *
   FROM ta
  WHERE EXISTS (SELECT *
                  FROM tb JOIN tc
                  ON ta.id = tb.id);
                     QUERY PLAN                                
-------------------------------------------------------------------------
 Nested Loop Semi Join (actual rows=1 loops=1)
   Buffers: local hit=6
   ->  Seq Scan on ta (actual rows=3 loops=1)
         Buffers: local hit=1
   ->  Nested Loop (actual rows=0 loops=3)
         Buffers: local hit=5
         ->  Index Only Scan using tb_pkey on tb (actual rows=0 loops=3)
               Index Cond: (id = ta.id)
               Heap Fetches: 1
               Buffers: local hit=4
         ->  Seq Scan on tc (actual rows=1 loops=1)
               Buffers: local hit=1
(12 rows)

tb and tc form a Cartesian product, but in the case of the intersection condition with tuples from the table ta (ta.id = tb.id). So, according to the join condition, tb intersects only with 1, and only it gets into the result, but at the same time they appear twice - this is because of the Cartesian product of tb with tc
How it works:

I rewrote the code a bit so that it considers not only the quals in jointree->quals, but also those in join expression (subselect->jointree->fromlist). If they satisfy the conditions for using pull up, I add them to the list of clauses and form a "Bool" expression from them, joined by an "AND" operation.

I took a look at this patch and I did a little polishing on it.

And I believe that in testing, you need to set it to BUFFERS OFF,
because of the recent change made to ANALYZE.

The tests are failing, like this: 
QUERY PLAN
 -------------------------------------------------------------------------
 Nested Loop Semi Join (actual rows=2 loops=1)
+ Buffers: local hit=7
 -> Seq Scan on ta (actual rows=2 loops=1)
+ Buffers: local hit=1
 -> Nested Loop (actual rows=1 loops=2)
+ Buffers: local hit=6
 -> Index Only Scan using tb_pkey on tb (actual rows=1 loops=2)
 Index Cond: (id = ta.id)
 Heap Fetches: 2
+ Buffers: local hit=4
 -> Seq Scan on tc (actual rows=1 loops=2)
-(7 rows)
+ Buffers: local hit=2
+(12 rows)

Yes, you are right) Thank you for your interest to this thread)
-- 
Regards,
Alena Rybakina
Postgres Professional

pgsql-hackers by date:

Previous
From: Vladlen Popolitov
Date:
Subject: Re: Windows UTF8 system locale
Next
From: Michail Nikolaev
Date:
Subject: Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements