Thread: Re: Exists pull-up application with JoinExpr
Yes, you are right) Thank you for your interest to this thread)Hi Alena.Em ter., 24 de dez. de 2024 às 01:44, Alena Rybakina <a.rybakina@postgrespro.ru> escreveu:Hi, hackers!
I found one pull-up that works if the inner join condition is written through the where condition,
create temp table ta (id int primary key, val int); insert into ta values(1,1); insert into ta values(2,2);
insert into ta values(3,3);
create temp table tb (id int primary key, aval int); insert into tb values(4,1); insert into tb values(5,1); insert into tb values(1,2); create temp table tc (id int primary key, aid int); insert into tc values(6,1); insert into tc values(7,2);
EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF) SELECT * FROM ta WHERE EXISTS (SELECT * FROM tb, tc WHERE ta.id = tb.id);
QUERY PLAN ------------------------------------------------------------------------- Nested Loop Semi Join (actual rows=1 loops=1) Buffers: local hit=6 -> Seq Scan on ta (actual rows=3 loops=1) Buffers: local hit=1 -> Nested Loop (actual rows=0 loops=3) Buffers: local hit=5 -> Index Only Scan using tb_pkey on tb (actual rows=0 loops=3) Index Cond: (id = ta.id) Heap Fetches: 1 Buffers: local hit=4 -> Seq Scan on tc (actual rows=1 loops=1) Buffers: local hit=1 Planning: Buffers: shared hit=67 read=12 (14 rows)but it doesn't work if it is written through the outside condition.
alena@postgres=# EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF) SELECT * FROM ta WHERE EXISTS (SELECT * FROM tb JOIN tc ON ta.id = tb.id); QUERY PLAN ------------------------------------------------------ Seq Scan on ta (actual rows=1 loops=1) Filter: EXISTS(SubPlan 1) Rows Removed by Filter: 2 Buffers: local hit=5 SubPlan 1 -> Nested Loop (actual rows=0 loops=3) Buffers: local hit=4 -> Seq Scan on tb (actual rows=0 loops=3) Filter: (ta.id = id) Rows Removed by Filter: 3 Buffers: local hit=3 -> Seq Scan on tc (actual rows=1 loops=1) Buffers: local hit=1 Planning: Buffers: shared hit=16 read=9 (15 rows)
I have written a patch to add this functionality and now it gives an query plan:
alena@postgres=# EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF)
SELECT *
FROM ta
WHERE EXISTS (SELECT *
FROM tb JOIN tc
ON ta.id = tb.id);
QUERY PLAN
-------------------------------------------------------------------------
Nested Loop Semi Join (actual rows=1 loops=1)
Buffers: local hit=6
-> Seq Scan on ta (actual rows=3 loops=1)
Buffers: local hit=1
-> Nested Loop (actual rows=0 loops=3)
Buffers: local hit=5
-> Index Only Scan using tb_pkey on tb (actual rows=0 loops=3)
Index Cond: (id = ta.id)
Heap Fetches: 1
Buffers: local hit=4
-> Seq Scan on tc (actual rows=1 loops=1)
Buffers: local hit=1
(12 rows)tb and tc form a Cartesian product, but in the case of the intersection condition with tuples from the table ta (ta.id = tb.id). So, according to the join condition, tb intersects only with 1, and only it gets into the result, but at the same time they appear twice - this is because of the Cartesian product of tb with tc
How it works:
I rewrote the code a bit so that it considers not only the quals in jointree->quals, but also those in join expression (subselect->jointree->fromlist). If they satisfy the conditions for using pull up, I add them to the list of clauses and form a "Bool" expression from them, joined by an "AND" operation.
I took a look at this patch and I did a little polishing on it.
And I believe that in testing, you need to set it to BUFFERS OFF,
because of the recent change made to ANALYZE.
The tests are failing, like this:QUERY PLAN
-------------------------------------------------------------------------
Nested Loop Semi Join (actual rows=2 loops=1)
+ Buffers: local hit=7
-> Seq Scan on ta (actual rows=2 loops=1)
+ Buffers: local hit=1
-> Nested Loop (actual rows=1 loops=2)
+ Buffers: local hit=6
-> Index Only Scan using tb_pkey on tb (actual rows=1 loops=2)
Index Cond: (id = ta.id)
Heap Fetches: 2
+ Buffers: local hit=4
-> Seq Scan on tc (actual rows=1 loops=2)
-(7 rows)
+ Buffers: local hit=2
+(12 rows)
-- Regards, Alena Rybakina Postgres Professional
Hi Alena, Thank you for your work on subqueries with JOIN. Have you considered the scenario where in subquery includes a qual like (tc.aid = 1)? When I tried executing those queries I receive different results. In my opinion, to prevent this, we should add filters for such quals within the loop 'foreach (lc, all_clauses)' EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF) SELECT * FROM ta WHERE EXISTS (SELECT * FROM tb, tc WHERE ta.id = tb.id AND tc.aid = 1); QUERY PLAN ---------------------------------------------------------------------- Hash Join (actual rows=1 loops=1) Hash Cond: (ta.id = tb.id) Buffers: local hit=3 -> Seq Scan on ta (actual rows=3 loops=1) Buffers: local hit=1 -> Hash (actual rows=3 loops=1) Buckets: 4096 Batches: 1 Memory Usage: 33kB Buffers: local hit=2 -> HashAggregate (actual rows=3 loops=1) Group Key: tb.id Batches: 1 Memory Usage: 121kB Buffers: local hit=2 -> Nested Loop (actual rows=3 loops=1) Buffers: local hit=2 -> Seq Scan on tb (actual rows=3 loops=1) Buffers: local hit=1 -> Materialize (actual rows=1 loops=3) Storage: Memory Maximum Storage: 17kB Buffers: local hit=1 -> Seq Scan on tc (actual rows=1 loops=1) Filter: (aid = 1) Rows Removed by Filter: 1 Buffers: local hit=1 (23 rows) ============================ EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF) SELECT * FROM ta WHERE EXISTS (SELECT * FROM tb JOIN tc ON ta.id = tb.id WHERE tc.aid = 1); QUERY PLAN --------------------------------------------------------------------------- Seq Scan on ta (actual rows=1 loops=1) Filter: EXISTS(SubPlan 1) Rows Removed by Filter: 2 Buffers: local hit=6 SubPlan 1 -> Nested Loop (actual rows=0 loops=3) Buffers: local hit=5 -> Index Only Scan using tb_pkey on tb (actual rows=0 loops=3) Index Cond: (id = ta.id) Heap Fetches: 1 Buffers: local hit=4 -> Seq Scan on tc (actual rows=1 loops=1) Filter: (aid = 1) Buffers: local hit=1 (14 rows) -- Best regards, Ilia Evdokimov, Tantor Labs LLC.
Hi! Thank you for your interest to this subject!
Hi Alena,
Thank you for your work on subqueries with JOIN.
Have you considered the scenario where in subquery includes a qual like (tc.aid = 1)? When I tried executing those queries I receive different results. In my opinion, to prevent this, we should add filters for such quals within the loop 'foreach (lc, all_clauses)'
EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF) SELECT * FROM ta
WHERE EXISTS (SELECT * FROM tb, tc WHERE ta.id = tb.id AND tc.aid = 1);
QUERY PLAN
----------------------------------------------------------------------
Hash Join (actual rows=1 loops=1)
Hash Cond: (ta.id = tb.id)
Buffers: local hit=3
-> Seq Scan on ta (actual rows=3 loops=1)
Buffers: local hit=1
-> Hash (actual rows=3 loops=1)
Buckets: 4096 Batches: 1 Memory Usage: 33kB
Buffers: local hit=2
-> HashAggregate (actual rows=3 loops=1)
Group Key: tb.id
Batches: 1 Memory Usage: 121kB
Buffers: local hit=2
-> Nested Loop (actual rows=3 loops=1)
Buffers: local hit=2
-> Seq Scan on tb (actual rows=3 loops=1)
Buffers: local hit=1
-> Materialize (actual rows=1 loops=3)
Storage: Memory Maximum Storage: 17kB
Buffers: local hit=1
-> Seq Scan on tc (actual rows=1 loops=1)
Filter: (aid = 1)
Rows Removed by Filter: 1
Buffers: local hit=1
(23 rows)
============================
EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF)
SELECT * FROM ta WHERE EXISTS (SELECT * FROM tb JOIN tc ON ta.id =
tb.id WHERE tc.aid = 1);
QUERY PLAN
---------------------------------------------------------------------------
Seq Scan on ta (actual rows=1 loops=1)
Filter: EXISTS(SubPlan 1)
Rows Removed by Filter: 2
Buffers: local hit=6
SubPlan 1
-> Nested Loop (actual rows=0 loops=3)
Buffers: local hit=5
-> Index Only Scan using tb_pkey on tb (actual rows=0 loops=3)
Index Cond: (id = ta.id)
Heap Fetches: 1
Buffers: local hit=4
-> Seq Scan on tc (actual rows=1 loops=1)
Filter: (aid = 1)
Buffers: local hit=1
(14 rows)
You are right, at the moment the code is not processed if there is a constant qual in the subquery (like t1.x1=1 in the example below) and this problem is not only related to the current patch.
For example you can get such a query plan if you complete this request to the master:
create table t (x int); create table t1 (x1 int); create table t2 (x2 int); EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF) SELECT 1 FROM t WHERE EXISTS (SELECT 1 FROM t1 where t1.x1 = 1); QUERY PLAN ------------------------------------------------ Result (actual rows=0 loops=1) One-Time Filter: (InitPlan 1).col1 InitPlan 1 -> Seq Scan on t1 (actual rows=0 loops=1) Filter: (x1 = 1) -> Seq Scan on t (never executed) (6 rows)
It's all because of the check in this function - this qual has levelsoup = 0, not 1 (see (!contain_vars_of_level(whereClause, 1)), but I already found out that by changing this, the logic of correction there is required a little more complicated. At the moment, I'm working to add this processing to the patch.
Thanks for the case!
-- Regards, Alena Rybakina Postgres Professional
On 10.02.2025 23:51, Ilia Evdokimov wrote: > > On 09.02.2025 18:14, Alena Rybakina wrote: >> Hi! I found another example where the transformation worked >> incorrectly and reconsidered the idea. >> >> As for conversion of exists_sublink_to_ANY, we need to get the >> flattened implicit-AND list of clauses and pull out the chunks of the >> WHERE clause that belong to the parent query, >> since we are called halfway through the parent's >> preprocess_expression() and earlier steps of preprocess_expression() >> wouldn't get applied to the pulled-up stuff unless we do them here. >> We also do some processing for vars depending on which side the var >> is on - if it's in a subquery, we only need to lower its level >> (varlevel) because subquery will be flatted, while >> for other vars that belong to the parent query, we need to do >> preparation to pull up the sub-select into top range table. >> >> For those expressions that we couldn't assign to either list, we >> define newWhere and apply both cases. >> > > When I run 'make -C contrib/ check', tests of postgres_fdw extension > failed. I might be wrong, but you should be careful with LIMIT. > Thank you for the review, I'm working on it. -- Regards, Alena Rybakina Postgres Professional