BUG #18152: Join condition is not pushed down to union all subquery - Mailing list pgsql-bugs
From | PG Bug reporting form |
---|---|
Subject | BUG #18152: Join condition is not pushed down to union all subquery |
Date | |
Msg-id | 18152-72d0fe42ca4283d9@postgresql.org Whole thread Raw |
Responses |
Re: BUG #18152: Join condition is not pushed down to union all subquery
|
List | pgsql-bugs |
The following bug has been logged on the website: Bug reference: 18152 Logged by: Lauri Kajan Email address: lauri.kajan@gmail.com PostgreSQL version: 16.0 Operating system: "PostgreSQL 16.0 (Debian 16.0-1.pgdg110+1) on x86_ Description: The join condition is not pushed down to a subquery containing UNION ALL when even a single query within the subquery contains a WHERE clause. This issue prevents the optimizer from creating a plan that utilizes the index on the join column. However, when the WHERE clause is moved to the top-level query, the optimizer produces the desired plan. EXPLAIN WITH targets AS ( SELECT 'bike' AS vehicle, id, dealer_name, frame_size FROM bikes WHERE frame_size = 52 UNION ALL SELECT 'car' AS vehicle, id, dealer_name, null::int AS frame_size FROM cars ) SELECT dealers.name dealer, targets.vehicle, targets.id FROM dealers JOIN targets ON dealers.name = targets.dealer_name WHERE dealers.id IN (54,12,456) ; --- Hash Join (cost=16.91..4587.77 rows=325 width=41) (actual time=0.270..27.133 rows=275 loops=1) Hash Cond: (""*SELECT* 1"".dealer_name = dealers.name) -> Append (cost=0.00..4026.20 rows=108280 width=41) (actual time=0.009..20.376 rows=108222 loops=1) -> Subquery Scan on ""*SELECT* 1"" (cost=0.00..1908.80 rows=8280 width=41) (actual time=0.009..6.708 rows=8222 loops=1) -> Seq Scan on bikes (cost=0.00..1826.00 rows=8280 width=45) (actual time=0.009..6.114 rows=8222 loops=1) Filter: (frame_size = 52) Rows Removed by Filter: 91778 -> Seq Scan on cars (cost=0.00..1576.00 rows=100000 width=41) (actual time=0.012..8.433 rows=100000 loops=1) -> Hash (cost=16.88..16.88 rows=3 width=5) (actual time=0.017..0.018 rows=3 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 9kB -> Index Scan using dealers_pkey on dealers (cost=0.28..16.88 rows=3 width=5) (actual time=0.009..0.015 rows=3 loops=1) Index Cond: (id = ANY ('{54,12,456}'::integer[])) Planning Time: 0.148 ms Execution Time: 27.170 ms Moving the WHERE clause to the top-level query produces a more efficient query execution plan. Example: EXPLAIN ANALYZE WITH targets AS ( SELECT 'bike' AS vehicle, id, dealer_name, frame_size FROM bikes UNION ALL SELECT 'car' AS vehicle, id, dealer_name, null::int AS frame_size FROM cars ) SELECT dealers.name dealer, targets.vehicle, targets.id FROM dealers JOIN targets ON dealers.name = targets.dealer_name WHERE dealers.id IN (54,12,456) AND (frame_size IS null OR frame_size = 52) ; ---- Nested Loop (cost=5.32..1443.67 rows=325 width=41) (actual time=0.106..0.815 rows=275 loops=1) -> Index Scan using dealers_pkey on dealers (cost=0.28..16.88 rows=3 width=5) (actual time=0.019..0.031 rows=3 loops=1) Index Cond: (id = ANY ('{54,12,456}'::integer[])) -> Append (cost=5.04..474.52 rows=108 width=41) (actual time=0.048..0.250 rows=92 loops=3) -> Bitmap Heap Scan on bikes (cost=5.04..237.10 rows=8 width=41) (actual time=0.045..0.123 rows=9 loops=3) Recheck Cond: (dealer_name = dealers.name) Filter: ((frame_size IS NULL) OR (frame_size = 52)) Rows Removed by Filter: 91 Heap Blocks: exact=274 -> Bitmap Index Scan on bikes_dealer_name_idx (cost=0.00..5.04 rows=100 width=0) (actual time=0.010..0.010 rows=100 loops=3) Index Cond: (dealer_name = dealers.name) -> Bitmap Heap Scan on cars (cost=5.07..236.87 rows=100 width=41) (actual time=0.027..0.116 rows=82 loops=3) Recheck Cond: (dealer_name = dealers.name) Heap Blocks: exact=233 -> Bitmap Index Scan on cars_dealer_name_idx (cost=0.00..5.04 rows=100 width=0) (actual time=0.017..0.017 rows=82 loops=3) Index Cond: (dealer_name = dealers.name) Planning Time: 0.238 ms Execution Time: 0.863 ms Unfortunately this workaround may become complex if there are multiple filters for multiple tables, as each filtered column must be added to every unioned query. Here's a script to generate sample data for the example case: CREATE TABLE dealers AS SELECT id, (SELECT string_agg(CHR(65+(random() * 25)::integer), '') FROM generate_series(1, 4) WHERE id>0) name FROM generate_series(1, 1000) AS id ; ALTER TABLE dealers ADD primary key (id); CREATE INDEX ON dealers(name); CREATE TABLE bikes AS SELECT generate_series AS id, (SELECT name FROM dealers WHERE dealers.id = (SELECT (random()*1000)::int WHERE generate_series>0)) AS dealer_name, (random()*12+50)::int as frame_size FROM generate_series(1, 100000); ALTER TABLE bikes ADD primary key (id); CREATE INDEX ON bikes(dealer_name); CREATE TABLE cars AS SELECT generate_series as id, (SELECT name FROM dealers WHERE dealers.id = (SELECT (random()*1000)::int WHERE generate_series>0)) AS dealer_name, (random()*7+14)::int as wheel_size FROM generate_series(1, 100000); ALTER TABLE cars ADD primary key (id); CREATE INDEX ON cars(dealer_name);
pgsql-bugs by date: