Re: BUG #18152: Join condition is not pushed down to union all subquery - Mailing list pgsql-bugs

From Richard Guo
Subject Re: BUG #18152: Join condition is not pushed down to union all subquery
Date
Msg-id CAMbWs48dZo6=coBNC8EUyGRofxAayVM5gSyiRNMfuj5YbYQCyw@mail.gmail.com
Whole thread Raw
In response to Re: BUG #18152: Join condition is not pushed down to union all subquery  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-bugs

On Wed, Oct 11, 2023 at 12:09 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
I think this is less about "can't push down" than "can't pull up",
specifically that prepjointree.c fails to flatten that subquery
into an "appendrel member", per the comments in is_safe_append_member:

Yeah, I agree with you that this is more about "can't pull up".  The
first leaf of the UNION ALL appendrel is a subquery that has WHERE quals
so it fails to be pulled up, as explained in the comments you pointed
out.
 
I don't recall at the moment if there are fundamental reasons not to
have per-child quals in appendrels, or if it could be done with the
application of enough elbow grease.  But it's probably not trivial.
That comment has been there quite awhile.

I'm wondering if we can keep the per-child quals in AppendRelInfos, and
then apply these quals when we create RelOptInfos for the children of an
appendrel, specifically in expand_appendrel_subquery().

I have a go at this and it can fix the origin problem.

EXPLAIN (COSTS OFF)
WITH
  targets AS (
    SELECT 'bike' AS vehicle, id, dealer_name, frame_size FROM bikes WHERE
frame_size = 52
    UNION ALL
    SELECT 'car' AS vehicle, id, dealer_name, null::int AS frame_size FROM
cars
  )
SELECT
  dealers.name dealer,
  targets.vehicle,
  targets.id
FROM
  dealers
  JOIN targets
    ON dealers.name = targets.dealer_name
WHERE
  dealers.id IN (54,12,456);
                          QUERY PLAN
--------------------------------------------------------------
 Nested Loop
   ->  Index Scan using dealers_pkey on dealers
         Index Cond: (id = ANY ('{54,12,456}'::integer[]))
   ->  Append
         ->  Bitmap Heap Scan on bikes
               Recheck Cond: (dealer_name = dealers.name)
               Filter: (frame_size = 52)
               ->  Bitmap Index Scan on bikes_dealer_name_idx
                     Index Cond: (dealer_name = dealers.name)
         ->  Bitmap Heap Scan on cars
               Recheck Cond: (dealer_name = dealers.name)
               ->  Bitmap Index Scan on cars_dealer_name_idx
                     Index Cond: (dealer_name = dealers.name)
(13 rows)

However, when applying the per-child quals in expand_appendrel_subquery,
I cannot find a way to make these quals go through the EC machinery.
And that would cause us to miss some optimal paths, such as

EXPLAIN (COSTS OFF)
SELECT 'bike' AS vehicle, id, dealer_name, frame_size FROM bikes WHERE
frame_size = 52 and frame_size = id
    UNION ALL
SELECT 'car' AS vehicle, id, dealer_name, null::int AS frame_size FROM
cars;
                        QUERY PLAN
-----------------------------------------------------------
 Append
   ->  Seq Scan on bikes
         Filter: ((frame_size = 52) AND (frame_size = id))
   ->  Seq Scan on cars
(4 rows)

We should have been able to get 'bikes.id = 52' from '(frame_size = 52)
AND (frame_size = id)', and then use index scan on 'bikes', but ...

Maybe we can achieve that with more efforts, but I'm not sure if this is
worthwhile.  Any thoughts?

Thanks
Richard

pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: System administration functions about relation size ignore changes in the table structure
Next
From: PG Bug reporting form
Date:
Subject: BUG #18158: Assert in pgstat_report_stat() fails when a backend shutting down with stats pending