Re: pull-up subquery if JOIN-ON contains refs to upper-query - Mailing list pgsql-hackers
From | Alena Rybakina |
---|---|
Subject | Re: pull-up subquery if JOIN-ON contains refs to upper-query |
Date | |
Msg-id | e172de23-7acb-4330-a40a-58ead7a47944@postgrespro.ru Whole thread Raw |
In response to | Re: pull-up subquery if JOIN-ON contains refs to upper-query (Alena Rybakina <a.rybakina@postgrespro.ru>) |
Responses |
Re: pull-up subquery if JOIN-ON contains refs to upper-query
|
List | pgsql-hackers |
Hi! My colleague reviewed my patch and gave feedback on how to improve it - for some queries with data types that I did not consider, pull-up is not applied, although it should. Some of them: EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF, BUFFERS OFF) SELECT 1 FROM ta WHERE EXISTS (SELECT 1 FROM tb JOIN tc ON ta.id = tb.id AND tb.id = ANY('{1}'::int[]) ); QUERY PLAN -------------------------------------------------------------------------- Seq Scan on ta (actual rows=1.00 loops=1) Filter: EXISTS(SubPlan 1) Rows Removed by Filter: 1 SubPlan 1 -> Nested Loop (actual rows=0.50 loops=2) -> Seq Scan on tb (actual rows=0.50 loops=2) Filter: ((id = ANY ('{1}'::integer[])) AND (ta.id = id)) Rows Removed by Filter: 2 -> Seq Scan on tc (actual rows=1.00 loops=1) EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF, BUFFERS OFF) SELECT 1 FROM ta WHERE EXISTS (SELECT 1 FROM tb JOIN tc ON ta.id = tb.id AND tb.is_active ); QUERY PLAN --------------------------------------------------------- Seq Scan on ta (actual rows=2.00 loops=1) Filter: EXISTS(SubPlan 1) SubPlan 1 -> Nested Loop (actual rows=1.00 loops=2) -> Seq Scan on tb (actual rows=1.00 loops=2) Filter: (is_active AND (ta.id = id)) Rows Removed by Filter: 0 -> Seq Scan on tc (actual rows=1.00 loops=2) EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF, BUFFERS OFF) SELECT 1 FROM ta WHERE EXISTS (SELECT 1 FROM tb JOIN tc ON ta.id = tb.id AND tb.is_active IS NOT NULL ); QUERY PLAN -------------------------------------------------------------------- Seq Scan on ta (actual rows=2.00 loops=1) Filter: EXISTS(SubPlan 1) SubPlan 1 -> Nested Loop (actual rows=1.00 loops=2) -> Seq Scan on tb (actual rows=1.00 loops=2) Filter: ((is_active IS NOT NULL) AND (ta.id = id)) Rows Removed by Filter: 0 -> Seq Scan on tc (actual rows=1.00 loops=2) UPDATE tb SET is_active = NULL WHERE id = 2; EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF, BUFFERS OFF) SELECT 1 FROM ta WHERE EXISTS (SELECT 1 FROM tb JOIN tc ON ta.id = tb.id AND tb.is_active IS NULL ); QUERY PLAN ---------------------------------------------------------------- Seq Scan on ta (actual rows=1.00 loops=1) Filter: EXISTS(SubPlan 1) Rows Removed by Filter: 1 SubPlan 1 -> Nested Loop (actual rows=0.50 loops=2) -> Seq Scan on tb (actual rows=0.50 loops=2) Filter: ((is_active IS NULL) AND (ta.id = id)) Rows Removed by Filter: 4 -> Seq Scan on tc (actual rows=1.00 loops=1) I see that I need to add a walker that, when traversing the tree, determines whether there are conditions under which pull-up is impossible - the presence of volatility of functions and other restrictions, and leave the transformation for the var objects that I added before, I described it here. Unfortunately, I need a few days to implement this and need time for a review, and I think I will not have time to do this before the code freeze, so I am moving this to the next commitfest and not changing the status "awaiting the author". On 11.02.2025 18:59, Alena Rybakina wrote: > On 10.02.2025 23:51, Ilia Evdokimov wrote: >> >> On 09.02.2025 18:14, Alena Rybakina wrote: >>> Hi! I found another example where the transformation worked >>> incorrectly and reconsidered the idea. >>> >>> As for conversion of exists_sublink_to_ANY, we need to get the >>> flattened implicit-AND list of clauses and pull out the chunks of >>> the WHERE clause that belong to the parent query, >>> since we are called halfway through the parent's >>> preprocess_expression() and earlier steps of preprocess_expression() >>> wouldn't get applied to the pulled-up stuff unless we do them here. >>> We also do some processing for vars depending on which side the var >>> is on - if it's in a subquery, we only need to lower its level >>> (varlevel) because subquery will be flatted, while >>> for other vars that belong to the parent query, we need to do >>> preparation to pull up the sub-select into top range table. >>> >>> For those expressions that we couldn't assign to either list, we >>> define newWhere and apply both cases. >>> >> >> When I run 'make -C contrib/ check', tests of postgres_fdw extension >> failed. I might be wrong, but you should be careful with LIMIT. >> > Thank you for the review, I'm working on it. > Sorry for not responding, but I will fix this bug after I update the code based on the comments above. Thank you for noticing and writing to me, your feedback is very important. -- Regards, Alena Rybakina Postgres Professional
pgsql-hackers by date: