BUG #18152: Join condition is not pushed down to union all subquery - Mailing list pgsql-bugs

From PG Bug reporting form
Subject BUG #18152: Join condition is not pushed down to union all subquery
Date
Msg-id 18152-72d0fe42ca4283d9@postgresql.org
Whole thread Raw
Responses Re: BUG #18152: Join condition is not pushed down to union all subquery  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-bugs
The following bug has been logged on the website:

Bug reference:      18152
Logged by:          Lauri Kajan
Email address:      lauri.kajan@gmail.com
PostgreSQL version: 16.0
Operating system:   "PostgreSQL 16.0 (Debian 16.0-1.pgdg110+1) on x86_
Description:

The join condition is not pushed down to a subquery containing UNION ALL
when even a single query within the subquery contains a WHERE clause. This
issue prevents the optimizer from creating a plan that utilizes the index on
the join column. However, when the WHERE clause is moved to the top-level
query, the optimizer produces the desired plan.

EXPLAIN
WITH
  targets AS (
    SELECT 'bike' AS vehicle, id, dealer_name, frame_size FROM bikes WHERE
frame_size = 52
    UNION ALL
    SELECT 'car' AS vehicle, id, dealer_name, null::int AS frame_size FROM
cars
  )
SELECT
  dealers.name dealer,
  targets.vehicle,
  targets.id
FROM
  dealers
  JOIN targets
    ON dealers.name = targets.dealer_name
WHERE
  dealers.id IN (54,12,456)
;
---
Hash Join  (cost=16.91..4587.77 rows=325 width=41) (actual
time=0.270..27.133 rows=275 loops=1)
  Hash Cond: (""*SELECT* 1"".dealer_name = dealers.name)
  ->  Append  (cost=0.00..4026.20 rows=108280 width=41) (actual
time=0.009..20.376 rows=108222 loops=1)
        ->  Subquery Scan on ""*SELECT* 1""  (cost=0.00..1908.80 rows=8280
width=41) (actual time=0.009..6.708 rows=8222 loops=1)
              ->  Seq Scan on bikes  (cost=0.00..1826.00 rows=8280 width=45)
(actual time=0.009..6.114 rows=8222 loops=1)
                    Filter: (frame_size = 52)
                    Rows Removed by Filter: 91778
        ->  Seq Scan on cars  (cost=0.00..1576.00 rows=100000 width=41)
(actual time=0.012..8.433 rows=100000 loops=1)
  ->  Hash  (cost=16.88..16.88 rows=3 width=5) (actual time=0.017..0.018
rows=3 loops=1)
        Buckets: 1024  Batches: 1  Memory Usage: 9kB
        ->  Index Scan using dealers_pkey on dealers  (cost=0.28..16.88
rows=3 width=5) (actual time=0.009..0.015 rows=3 loops=1)
              Index Cond: (id = ANY ('{54,12,456}'::integer[]))
Planning Time: 0.148 ms
Execution Time: 27.170 ms



Moving the WHERE clause to the top-level query produces a more efficient
query execution plan. Example:

EXPLAIN ANALYZE
WITH
  targets AS (
    SELECT 'bike' AS vehicle, id, dealer_name, frame_size FROM bikes
    UNION ALL
    SELECT 'car' AS vehicle, id, dealer_name, null::int AS frame_size FROM
cars
  )
SELECT
  dealers.name dealer,
  targets.vehicle,
  targets.id
FROM
  dealers
  JOIN targets
    ON dealers.name = targets.dealer_name
WHERE
  dealers.id IN (54,12,456)
  AND (frame_size IS null OR frame_size = 52)
;
----
Nested Loop  (cost=5.32..1443.67 rows=325 width=41) (actual
time=0.106..0.815 rows=275 loops=1)
  ->  Index Scan using dealers_pkey on dealers  (cost=0.28..16.88 rows=3
width=5) (actual time=0.019..0.031 rows=3 loops=1)
        Index Cond: (id = ANY ('{54,12,456}'::integer[]))
  ->  Append  (cost=5.04..474.52 rows=108 width=41) (actual
time=0.048..0.250 rows=92 loops=3)
        ->  Bitmap Heap Scan on bikes  (cost=5.04..237.10 rows=8 width=41)
(actual time=0.045..0.123 rows=9 loops=3)
              Recheck Cond: (dealer_name = dealers.name)
              Filter: ((frame_size IS NULL) OR (frame_size = 52))
              Rows Removed by Filter: 91
              Heap Blocks: exact=274
              ->  Bitmap Index Scan on bikes_dealer_name_idx
(cost=0.00..5.04 rows=100 width=0) (actual time=0.010..0.010 rows=100
loops=3)
                    Index Cond: (dealer_name = dealers.name)
        ->  Bitmap Heap Scan on cars  (cost=5.07..236.87 rows=100 width=41)
(actual time=0.027..0.116 rows=82 loops=3)
              Recheck Cond: (dealer_name = dealers.name)
              Heap Blocks: exact=233
              ->  Bitmap Index Scan on cars_dealer_name_idx
(cost=0.00..5.04 rows=100 width=0) (actual time=0.017..0.017 rows=82
loops=3)
                    Index Cond: (dealer_name = dealers.name)
Planning Time: 0.238 ms
Execution Time: 0.863 ms

Unfortunately this workaround may become complex if there are multiple
filters for multiple tables, as each filtered column must be added to every
unioned query.


Here's a script to generate sample data for the example case:

CREATE TABLE dealers AS
SELECT
  id,
  (SELECT string_agg(CHR(65+(random() * 25)::integer), '') FROM
generate_series(1, 4) WHERE id>0) name
FROM generate_series(1, 1000) AS id
;
ALTER TABLE dealers ADD primary key (id);
CREATE INDEX ON dealers(name);

CREATE TABLE bikes AS
SELECT
  generate_series AS id,
  (SELECT name FROM dealers WHERE dealers.id = (SELECT (random()*1000)::int
WHERE generate_series>0)) AS dealer_name,
  (random()*12+50)::int as frame_size
FROM generate_series(1, 100000);
ALTER TABLE bikes ADD primary key (id);
CREATE INDEX ON bikes(dealer_name);

CREATE TABLE cars AS
SELECT
  generate_series as id,
  (SELECT name FROM dealers WHERE dealers.id = (SELECT (random()*1000)::int
WHERE generate_series>0)) AS dealer_name,
  (random()*7+14)::int as wheel_size
FROM generate_series(1, 100000);
ALTER TABLE cars ADD primary key (id);
CREATE INDEX ON cars(dealer_name);


pgsql-bugs by date:

Previous
From: Andres Freund
Date:
Subject: Re: Access to old versions of a row
Next
From: Zhaoxun Yan
Date:
Subject: Re: pg_rewind: ERROR: could not fetch remote file "global/pg_control": ERROR: permission denied