Re: pull-up subquery if JOIN-ON contains refs to upper-query - Mailing list pgsql-hackers

From Alena Rybakina
Subject Re: pull-up subquery if JOIN-ON contains refs to upper-query
Date
Msg-id e172de23-7acb-4330-a40a-58ead7a47944@postgrespro.ru
Whole thread Raw
In response to Re: pull-up subquery if JOIN-ON contains refs to upper-query  (Alena Rybakina <a.rybakina@postgrespro.ru>)
Responses Re: pull-up subquery if JOIN-ON contains refs to upper-query
List pgsql-hackers
Hi!

My colleague reviewed my patch and gave feedback on how to improve it - 
for some queries with data types that I did not consider, pull-up is not 
applied, although it should. Some of them:

EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF, BUFFERS OFF)
SELECT 1
   FROM ta
  WHERE EXISTS (SELECT 1
                  FROM tb
                  JOIN tc
                    ON ta.id = tb.id
                   AND tb.id = ANY('{1}'::int[])
               );

                                 QUERY PLAN
--------------------------------------------------------------------------
  Seq Scan on ta (actual rows=1.00 loops=1)
    Filter: EXISTS(SubPlan 1)
    Rows Removed by Filter: 1
    SubPlan 1
      ->  Nested Loop (actual rows=0.50 loops=2)
            ->  Seq Scan on tb (actual rows=0.50 loops=2)
                  Filter: ((id = ANY ('{1}'::integer[])) AND (ta.id = id))
                  Rows Removed by Filter: 2
            ->  Seq Scan on tc (actual rows=1.00 loops=1)

EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF, BUFFERS OFF)
  SELECT 1
    FROM ta
   WHERE EXISTS (SELECT 1
                   FROM tb
                   JOIN tc
                     ON ta.id = tb.id
                    AND tb.is_active
                );
                        QUERY PLAN
---------------------------------------------------------
  Seq Scan on ta (actual rows=2.00 loops=1)
    Filter: EXISTS(SubPlan 1)
    SubPlan 1
      ->  Nested Loop (actual rows=1.00 loops=2)
            ->  Seq Scan on tb (actual rows=1.00 loops=2)
                  Filter: (is_active AND (ta.id = id))
                  Rows Removed by Filter: 0
            ->  Seq Scan on tc (actual rows=1.00 loops=2)

EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF, BUFFERS OFF)
  SELECT 1
    FROM ta
   WHERE EXISTS (SELECT 1
                   FROM tb
                   JOIN tc
                     ON ta.id = tb.id
                    AND tb.is_active IS NOT NULL
                );

                                    QUERY PLAN
--------------------------------------------------------------------
  Seq Scan on ta (actual rows=2.00 loops=1)
    Filter: EXISTS(SubPlan 1)
     SubPlan 1
        ->  Nested Loop (actual rows=1.00 loops=2)
              ->  Seq Scan on tb (actual rows=1.00 loops=2)
                    Filter: ((is_active IS NOT NULL) AND (ta.id = id))
                    Rows Removed by Filter: 0
              ->  Seq Scan on tc (actual rows=1.00 loops=2)

UPDATE tb SET is_active = NULL WHERE id = 2;

EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF, BUFFERS OFF)
  SELECT 1
    FROM ta
   WHERE EXISTS (SELECT 1
                   FROM tb
                   JOIN tc
                     ON ta.id = tb.id
                    AND tb.is_active IS NULL
                );
                               QUERY PLAN
----------------------------------------------------------------
  Seq Scan on ta (actual rows=1.00 loops=1)
    Filter: EXISTS(SubPlan 1)
    Rows Removed by Filter: 1
    SubPlan 1
      ->  Nested Loop (actual rows=0.50 loops=2)
            ->  Seq Scan on tb (actual rows=0.50 loops=2)
                  Filter: ((is_active IS NULL) AND (ta.id = id))
                  Rows Removed by Filter: 4
            ->  Seq Scan on tc (actual rows=1.00 loops=1)

I see that I need to add a walker that, when traversing the tree, 
determines whether there are conditions under which pull-up is 
impossible - the presence of
volatility of functions and other restrictions, and leave the 
transformation for the var objects that I added before, I described it 
here.

Unfortunately, I need a few days to implement this and need time for a 
review, and I think I will not have time to do this before the code 
freeze, so
I am moving this to the next commitfest and not changing the status 
"awaiting the author".

On 11.02.2025 18:59, Alena Rybakina wrote:
> On 10.02.2025 23:51, Ilia Evdokimov wrote:
>>
>> On 09.02.2025 18:14, Alena Rybakina wrote:
>>> Hi! I found another example where the transformation worked 
>>> incorrectly and reconsidered the idea.
>>>
>>> As for conversion of exists_sublink_to_ANY, we need to get the 
>>> flattened implicit-AND list of clauses and pull out the chunks of 
>>> the WHERE clause that belong to the parent query,
>>> since we are called halfway through the parent's 
>>> preprocess_expression() and earlier steps of preprocess_expression() 
>>> wouldn't get applied to the pulled-up stuff unless we do them here.
>>> We also do some processing for vars depending on which side the var 
>>> is on - if it's in a subquery, we only need to lower its level 
>>> (varlevel) because subquery will be flatted, while
>>> for other vars that belong to the parent query, we need to do 
>>> preparation to pull up the sub-select into top range table.
>>>
>>> For those expressions that we couldn't assign to either list, we 
>>> define newWhere and apply both cases.
>>>
>>
>> When I run 'make -C contrib/ check', tests of postgres_fdw extension 
>> failed. I might be wrong, but you should be careful with LIMIT.
>>
> Thank you for the review, I'm working on it.
>
Sorry for not responding, but I will fix this bug after I update the 
code based on the comments above. Thank you for noticing and writing to 
me, your feedback is very important.

-- 
Regards,
Alena Rybakina
Postgres Professional




pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: BTScanOpaqueData size slows down tests
Next
From: Tomas Vondra
Date:
Subject: Re: Draft for basic NUMA observability