Thread: BUG #18854: PostgreSQL chooses a suboptimal execution plan when using a specific WHERE filter
BUG #18854: PostgreSQL chooses a suboptimal execution plan when using a specific WHERE filter
The following bug has been logged on the website: Bug reference: 18854 Logged by: Yoni Sade Email address: yonisade83@gmail.com PostgreSQL version: 16.8 Operating system: RDS Linux Description: Query has a specific WHERE filter for tiny lookup table "t8" (19 rows) which generates an execution plan which has two "Nested Loop" steps instead of "Hash Join" steps which makes the query finish after 90 seconds (and spiking the server CPU when being ran by dozens of sessions simultaneously) instead of ~3 seconds (when commenting out that specific WHERE filter clause I mentioned below): Query #1 (runtime: ~90 seconds):: SELECT count(*) AS total FROM ( SELECT 1 FROM v_m JOIN (SELECT *, generate_series(mv.start_year, mv.end_year) as year FROM mv) t3 on v_m.m_id = t3.m_id LEFT JOIN t9 ON t3.mm_id = t9.mm_id and t3.year = t9.year LEFT JOIN v_mm ON t3.mm_id = v_mm.mm_id LEFT JOIN t8 ON v_m.id = t8.mm_f_id LEFT JOIN t7 ON t8.sm_f_id = t7.f_id WHERE v_m.is_exists = true AND t3.is_active = true AND v_m.mk_id IN (7) AND t3.year >= 2021 AND t3.year <= 2024 AND t8.sm_f_id IN (8) -- this specific clause slows down the query GROUP BY v_m.m_id, v_m.m_h, t3.make, v_m.img, COALESCE(t9.is_new, false), t3.year) T; Execution plan #1: Node Type Entity Cost Rows Time Condition Aggregate [NULL] 44501.78 - 44501.79 1 [NULL] [NULL] Group [NULL] 44501.48 - 44501.68 8 [NULL] [NULL] Incremental Sort [NULL] 44501.48 - 44501.60 8 [NULL] [NULL] Nested Loop [NULL] 29224.92 - 44501.36 8 [NULL] [NULL] Nested Loop [NULL] 29224.92 - 44486.38 4 [NULL] [NULL] Nested Loop [NULL] 2048.29 - 13628.25 1 [NULL] [NULL] Nested Loop [NULL] 2047.87 - 13623.80 1 [NULL] [NULL] Nested Loop [NULL] 1047.87 - 1060.89 1 [NULL] COALESCE(((count(*)) > 0), FALSE) Index Scan t1 0.27 - 8.29 1 [NULL] (mk_id = 7) Aggregate [NULL] 1047.60 - 1049.60 200 [NULL] [NULL] Seq Scan t2 0.00 - 993.28 10864 [NULL] is_active Subquery Scan [NULL] 1000.00 - 12562.73 15 [NULL] "((t3.year >= 2021) AND (t3.year <= 2024))" Gather [NULL] 1000.00 - 12516.33 3093 [NULL] [NULL] ProjectSet [NULL] 0.00 - 11207.03 1819000 [NULL] [NULL] Index Only Scan t9 0.42 - 4.44 1 [NULL] "((mm_id = t3.mm_id) AND (YEAR = t3.year))" Hash Join [NULL] 27176.63 - 30848.49 771 [NULL] (t4.f_id = t6.mm_f_id) Unique [NULL] 27175.38 - 28718.18 154280 [NULL] [NULL] Sort [NULL] 27175.38 - 27561.08 154280 [NULL] [NULL] Append [NULL] 16.07 - 3859.65 154280 [NULL] [NULL] Hash Join [NULL] 16.07 - 773.18 38220 [NULL] (t4.f_id = t7.f_id) Hash Join [NULL] 15.85 - 2315.07 116060 [NULL] (t5.t_id = t.t_id) Hash [NULL] 1.24 - 1.24 1 [NULL] [NULL] Seq Scan t8 0.00 - 1.24 1 [NULL] (sm_f_id = 8) Materialize [NULL] 0.00 - 14.88 2 [NULL] [NULL] Seq Scan t7 0.00 - 14.88 2 [NULL] (f_id = 8) Query #2 (runtime: ~3 seconds): SELECT count(*) AS total FROM ( SELECT 1 FROM v_m JOIN (SELECT *, generate_series(mv.start_year, mv.end_year) as year FROM mv) t3 on v_m.m_id = t3.m_id LEFT JOIN t9 ON t3.mm_id = t9.mm_id and t3.year = t9.year LEFT JOIN v_mm ON t3.mm_id = v_mm.mm_id LEFT JOIN t8 ON v_m.id = t8.mm_f_id LEFT JOIN t7 ON t8.sm_f_id = t7.f_id WHERE v_m.is_exists = true AND t3.is_active = true AND v_m.mk_id IN (7) AND t3.year >= 2021 AND t3.year <= 2024 GROUP BY v_m.m_id, v_m.m_h, t3.make, v_m.img, COALESCE(t9.is_new, false), t3.year) T; Execution plan #2: Node Type Entity Cost Rows Time Condition Aggregate [NULL] 44501.78 - 44501.79 1 [NULL] [NULL] Group [NULL] 44501.48 - 44501.68 8 [NULL] [NULL] Incremental Sort [NULL] 44501.48 - 44501.60 8 [NULL] [NULL] Nested Loop [NULL] 29224.92 - 44501.36 8 [NULL] [NULL] Nested Loop [NULL] 29224.92 - 44486.38 4 [NULL] [NULL] Nested Loop [NULL] 2048.29 - 13628.25 1 [NULL] [NULL] Nested Loop [NULL] 2047.87 - 13623.80 1 [NULL] [NULL] Nested Loop [NULL] 1047.87 - 1060.89 1 [NULL] COALESCE(((count(*)) > 0), FALSE) Index Scan t1 0.27 - 8.29 1 [NULL] (mk_id = 7) Aggregate [NULL] 1047.60 - 1049.60 200 [NULL] [NULL] Seq Scan t2 0.00 - 993.28 10864 [NULL] is_active Subquery Scan [NULL] 1000.00 - 12562.73 15 [NULL] "((t3.year >= 2021) AND (t3.year <= 2024))" Gather [NULL] 1000.00 - 12516.33 3093 [NULL] [NULL] ProjectSet [NULL] 0.00 - 11207.03 1819000 [NULL] [NULL] Index Only Scan t9 0.42 - 4.44 1 [NULL] "((mm_id = t3.mm_id) AND (YEAR = t3.year))" Hash Join [NULL] 27176.63 - 30848.49 771 [NULL] (t4.f_id = t6.mm_f_id) Unique [NULL] 27175.38 - 28718.18 154280 [NULL] [NULL] Sort [NULL] 27175.38 - 27561.08 154280 [NULL] [NULL] Append [NULL] 16.07 - 3859.65 154280 [NULL] [NULL] Hash Join [NULL] 16.07 - 773.18 38220 [NULL] (t4.f_id = t7.f_id) Hash Join [NULL] 15.85 - 2315.07 116060 [NULL] (t5.t_id = t.t_id) Hash [NULL] 1.24 - 1.24 1 [NULL] [NULL] Seq Scan t8 0.00 - 1.24 1 [NULL] (sm_f_id = 8) Materialize [NULL] 0.00 - 14.88 2 [NULL] [NULL] Seq Scan t7 0.00 - 14.88 2 [NULL] (f_id = 8)
Re: BUG #18854: PostgreSQL chooses a suboptimal execution plan when using a specific WHERE filter
On 3/18/25 12:34, PG Bug reporting form wrote: > Query has a specific WHERE filter for tiny lookup table "t8" (19 rows) which > generates an execution plan which has two "Nested Loop" steps instead of > "Hash Join" steps which makes the query finish after 90 seconds (and spiking > the server CPU when being ran by dozens of sessions simultaneously) instead > of ~3 seconds (when commenting out that specific WHERE filter clause I > mentioned below): May you provide EXPLAIN ANALYZE VERBOSE in standard text output format? -- regards, Andrei Lepikhov
Re: BUG #18854: PostgreSQL chooses a suboptimal execution plan when using a specific WHERE filter
I would like to add that commenting out that WHERE clause doesn't work for all of this query variations so I've found another workaround for now ("LIMIT 1000000" in bold)
to make the optimizer better estimate the cardinality of the subquery and change the plan to a better one:
This is the previous (suboptimal) plan in the format you asked for (with table/column names obfuscated):
Aggregate (cost=44501.78..44501.79 rows=1 width=8) (actual time=140784.531..140784.633 rows=1 loops=1)
Output: count(*)
-> Group (cost=44501.48..44501.68 rows=8 width=679) (actual time=4656.869..140784.563 rows=153 loops=1)
Output: NULL::integer, t1.***********, t1.***********, t3.***********, t1.***********, (COALESCE(p.is_new, false)), t3.year
Group Key: t1.***********, t3.***********, (COALESCE(p.is_new, false)), t3.year
-> Incremental Sort (cost=44501.48..44501.60 rows=8 width=675) (actual time=4656.866..140784.160 rows=1994 loops=1)
Output: t1.***********, t3.***********, (COALESCE(p.is_new, false)), t3.***********, t1.***********, t1.***********
Sort Key: t1.***********, t3.***********, (COALESCE(p.is_new, false)), t3.year
Presorted Key: t1.m_id
Full-sort Groups: 24 Sort Method: quicksort Average Memory: 30kB Peak Memory: 30kB
Pre-sorted Groups: 15 Sort Method: quicksort Average Memory: 34kB Peak Memory: 34kB
-> Nested Loop Left Join (cost=29224.92..44501.36 rows=8 width=675) (actual time=239.881..140777.492 rows=1994 loops=1)
Output: t1.***********, t3.***********, COALESCE(p.is_new, false), t3.***********, t1.***********, t1.***********
-> Nested Loop (cost=29224.92..44486.38 rows=4 width=679) (actual time=239.855..140771.693 rows=1994 loops=1)
Output: t1.***********, t1.***********, t1.***********, t3.***********, t3.***********, t9.***********, t8.***********
Join Filter: (t3.mm_id = t4.mm_id)
Rows Removed by Join Filter: 33142274
-> Nested Loop Left Join (cost=2048.29..13628.25 rows=1 width=679) (actual time=13.032..1942.680 rows=1994 loops=1)
Output: t1.***********, t1.***********, t1.***********, t3.***********, t3.***********, t3.***********, t9.***********
-> Nested Loop (cost=2047.87..13623.80 rows=1 width=678) (actual time=13.010..1906.470 rows=1650 loops=1)
Output: t1.***********, t1.***********, t1.***********, t3.***********, t3.***********, t3.***********
Join Filter: (t1.m_id = t3.m_id)
Rows Removed by Join Filter: 632234
-> Nested Loop Left Join (cost=1047.87..1060.89 rows=1 width=552) (actual time=11.398..19.784 rows=37 loops=1)
Output: t1.***********, t1.***********, t1.***********
Inner Unique: true
Join Filter: (t1.m_id = t2.m_id)
Rows Removed by Join Filter: 29232
Filter: COALESCE(((count(*)) > 0), false)
Rows Removed by Filter: 12
-> Index Scan using mds_mk_id_m_id_key on mds m (cost=0.27..8.29 rows=1 width=552) (actual time=0.023..0.223 rows=49 loops=1)
Output: t1.***********, t1.***********, t1.***********, t1.***********, t1.***********, t1.***********, t1.***********, t1.***********, t1.***********, t1.***********, t1.***********, t1.***********, t1.***********
Index Cond: (t1.mk_id = 7)
-> HashAggregate (cost=1047.60..1049.60 rows=200 width=12) (actual time=0.234..0.360 rows=597 loops=49)
Output: count(*), t2.m_id
Group Key: t2.m_id
Batches: 1 Memory Usage: 209kB
-> Seq Scan on t2 (cost=0.00..993.28 rows=10864 width=4) (actual time=0.040..7.570 rows=20717 loops=1)
Output: t2.***********, t2.***********, t2.***********, t2.***********, t2.***********, t2.***********, t2.***********, t2.***********, t2.***********, t2.***********, t2.***********, t2.***********, t2.***********, t2.***********, t2.***********, t2.***********, t2.***********, t2.***********, t2.***********, t2.***********, t2.***********, t2.***********, t2.***********, t2.***********, t2.***********, t2.***********, t2.***********, t2.***********, t2.***********
Filter: t2.is_active
Rows Removed by Filter: 11985
-> Subquery Scan on t3 (cost=1000.00..12562.73 rows=15 width=130) (actual time=1.405..49.852 rows=17132 loops=37)
Output: t3.***********, t3.***********, t3.***********, t3.***********, t3.***********, t3.***********, t3.***********, t3.***********, t3.***********, t3.***********, t3.***********, t3.***********, t3.***********, t3.***********, t3.***********, t3.***********, t3.***********, t3.***********, t3.***********, t3.***********, t3.***********, t3.***********, t3.***********, t3.***********, t3.***********, t3.***********, t3.***********, t3.***********, t3.***********, t3.***********, t3.***********, t3.***********, t3.***********, t3.***********, t3.***********, t3.***********, t3.***********, t3.***********, t3.***********, t3.***********, t3.***********, t3.***********, t3.***********, t3.***********, t3.***********, t3.***********, t3.***********, t3.***********, t3.***********, t3.***********, t3.***********, t3.***********, t3.***********, t3.***********
Filter: ((t3.year >= 2021) AND (t3.year <= 2024))
Rows Removed by Filter: 44378
-> Gather (cost=1000.00..12516.33 rows=3093 width=2335) (actual time=1.365..44.739 rows=61510 loops=37)
Output: NULL::timestamp without time zone, NULL::timestamp without time zone, mv.***********, NULL::text, NULL::integer, mv.***********, NULL::character varying(50), mv.***********, NULL::character varying(255), NULL::integer, NULL::character varying, NULL::integer, NULL::character varying(100), NULL::integer, NULL::character varying(50), NULL::integer, NULL::character varying(50), NULL::integer, NULL::character varying(50), NULL::integer, NULL::character varying(50), NULL::integer, NULL::integer, NULL::integer, NULL::character varying(50), NULL::integer, NULL::character varying(50), NULL::integer, NULL::character varying(50), NULL::integer, NULL::character varying(50), NULL::integer, NULL::text, NULL::text, NULL::integer, NULL::integer, NULL::integer, NULL::integer, NULL::boolean, NULL::boolean, NULL::boolean, NULL::boolean, NULL::integer[], NULL::text[], NULL::text[], NULL::text, NULL::text, NULL::boolean, NULL::integer, NULL::integer, NULL::integer, NULL::timestamp without time zone, NULL::timestamp without time zone, (generate_series(mv.s_year, mv.e_year))
Workers Planned: 1
Workers Launched: 1
-> ProjectSet (cost=0.00..11207.03 rows=1819000 width=2335) (actual time=0.034..25.766 rows=30755 loops=74)
Output: NULL::timestamp without time zone, NULL::timestamp without time zone, mv.***********, NULL::text, NULL::integer, mv.***********, NULL::character varying(50), mv.***********, NULL::character varying(255), NULL::integer, NULL::character varying, NULL::integer, NULL::character varying(100), NULL::integer, NULL::character varying(50), NULL::integer, NULL::character varying(50), NULL::integer, NULL::character varying(50), NULL::integer, NULL::character varying(50), NULL::integer, NULL::integer, NULL::integer, NULL::character varying(50), NULL::integer, NULL::character varying(50), NULL::integer, NULL::character varying(50), NULL::integer, NULL::character varying(50), NULL::integer, NULL::text, NULL::text, NULL::integer, NULL::integer, NULL::integer, NULL::integer, NULL::boolean, NULL::boolean, NULL::boolean, NULL::boolean, NULL::integer[], NULL::text[], NULL::text[], NULL::text, NULL::text, NULL::boolean, NULL::integer, NULL::integer, NULL::integer, NULL::timestamp without time zone, NULL::timestamp without time zone, generate_series(mv.s_year, mv.e_year)
Worker 0: actual time=0.019..33.874 rows=40139 loops=37
-> Parallel Seq Scan on mv (cost=0.00..2098.39 rows=1819 width=134) (actual time=0.029..10.233 rows=10358 loops=74)
Output: mv.***********, mv.***********, mv.***********, mv.***********, mv.***********, mv.***********, mv.***********, mv.***********, mv.***********, mv.***********, mv.***********, mv.***********, mv.***********, mv.***********, mv.***********, mv.***********, mv.***********, mv.***********, mv.***********, mv.***********, mv.***********, mv.***********, mv.***********, mv.***********, mv.***********, mv.***********, mv.***********, mv.***********, mv.***********, mv.***********, mv.***********, mv.***********, mv.***********, mv.***********, mv.***********, mv.***********, mv.***********, mv.***********, mv.***********, mv.***********, mv.***********, mv.***********, mv.***********, mv.***********, mv.***********, mv.***********, mv.***********, mv.***********, mv.***********, mv.***********, mv.***********, mv.***********, mv.***********
Filter: mv.is_active
Rows Removed by Filter: 5992
Worker 0: actual time=0.016..13.530 rows=13204 loops=37
-> Index Only Scan using t9_un on t9 (cost=0.42..4.44 rows=1 width=9) (actual time=0.014..0.018 rows=1 loops=1650)
Output: t9.***********, t9.***********, t9.***********
Index Cond: ((p.mm_id = t3.mm_id) AND (p.year = t3.year))
Heap Fetches: 0
-> Hash Join (cost=27176.63..30848.49 rows=771 width=8) (actual time=0.084..68.596 rows=16622 loops=1994)
Output: t4.***********, t8.sm_f_id
Hash Cond: (t4.f_id = t8.mm_f_id)
-> Unique (cost=27175.38..28718.18 rows=154280 width=126) (actual time=0.084..50.368 rows=166526 loops=1994)
Output: t4.***********, t4.***********, t7.***********
-> Sort (cost=27175.38..27561.08 rows=154280 width=126) (actual time=0.083..19.173 rows=166526 loops=1994)
Output: t4.***********, t4.***********, t7.***********
Sort Key: t4.***********, t4.***********, t7.***********
Sort Method: external sort Disk: 6912kB
-> Append (cost=16.07..3859.65 rows=154280 width=126) (actual time=0.025..67.960 rows=166526 loops=1)
-> Hash Left Join (cost=16.07..773.18 rows=38220 width=126) (actual time=0.024..12.886 rows=40136 loops=1)
Output: t4.***********, t4.***********, t7.***********
Inner Unique: true
Hash Cond: (t4.f_id = t7.f_id)
-> Seq Scan on t4 (cost=0.00..655.20 rows=38220 width=8) (actual time=0.005..3.407 rows=40136 loops=1)
Output: t4.***********, t4.***********, t4.***********, t4.***********, t4.***********, t4.***********, t4.***********
-> Hash (cost=12.70..12.70 rows=270 width=122) (actual time=0.013..0.014 rows=17 loops=1)
Output: t7.***********, t7.***********
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on t7 (cost=0.00..12.70 rows=270 width=122) (actual time=0.004..0.008 rows=17 loops=1)
Output: t7.***********, t7.***********
-> Hash Left Join (cost=15.85..2315.07 rows=116060 width=126) (actual time=0.036..41.281 rows=126390 loops=1)
Output: t10.***********, t10.***********, t5.************
Inner Unique: true
Hash Cond: (t10.t_id = t5.t_id)
-> Seq Scan on t10 (cost=0.00..1989.60 rows=116060 width=8) (actual time=0.012..10.756 rows=126390 loops=1)
Output: t10.***********, t10.***********, t10.***********, t10.***********, t10.***********, t10.***********, t10.***********
-> Hash (cost=12.60..12.60 rows=260 width=122) (actual time=0.013..0.014 rows=16 loops=1)
Output: t5.***********, t5.***********
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on t5 (cost=0.00..12.60 rows=260 width=122) (actual time=0.005..0.008 rows=16 loops=1)
Output: t5.***********, t5.***********
-> Hash (cost=1.24..1.24 rows=1 width=8) (actual time=0.011..0.011 rows=1 loops=1)
Output: t8.***********, t8.***********
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on t8 (cost=0.00..1.24 rows=1 width=8) (actual time=0.008..0.008 rows=1 loops=1)
Output: t8.***********, t8.***********
Filter: (t8.sm_f_id = 8)
Rows Removed by Filter: 18
-> Materialize (cost=0.00..14.88 rows=2 width=4) (actual time=0.001..0.001 rows=1 loops=1994)
Output: t7.***********
-> Seq Scan on t7 (cost=0.00..14.88 rows=2 width=4) (actual time=0.017..0.019 rows=1 loops=1)
Output: t7.***********
Filter: (t7.f_id = 8)
Rows Removed by Filter: 9
Query Identifier: -4992889262453896709
Planning Time: 1.762 ms
Execution Time: 140786.468 ms
On 3/18/25 12:34, PG Bug reporting form wrote:> Query has a specific WHERE filter for tiny lookup table "t8" (19 rows) which> generates an execution plan which has two "Nested Loop" steps instead of> "Hash Join" steps which makes the query finish after 90 seconds (and spiking> the server CPU when being ran by dozens of sessions simultaneously) instead> of ~3 seconds (when commenting out that specific WHERE filter clause I> mentioned below):May you provide EXPLAIN ANALYZE VERBOSE in standard text output format?--regards, Andrei Lepikhov
Re: BUG #18854: PostgreSQL chooses a suboptimal execution plan when using a specific WHERE filter
On 19/3/2025 14:34, Yoni Sade wrote: > I would like to add that commenting out that WHERE clause doesn't work > for all of this query variations so I've found another workaround for > now ("LIMIT 1000000" in *bold*) > > to make the optimizer better estimate the cardinality of the subquery > and change the plan to a better one:* I wouldn't say it is the ultimate root of the problem, but using generate_series, especially with non-constant parameters, seems harmful for the query planning: we can't predict the number of tuples and don't have any statistics on that set. So, your LIMIT works like a hint that helps decide how massive a set of tuples it will generate and seems to be the only option possible. -- regards, Andrei Lepikhov