Thread: BUG #18854: PostgreSQL chooses a suboptimal execution plan when using a specific WHERE filter

The following bug has been logged on the website:

Bug reference:      18854
Logged by:          Yoni Sade
Email address:      yonisade83@gmail.com
PostgreSQL version: 16.8
Operating system:   RDS Linux
Description:

Query has a specific WHERE filter for tiny lookup table "t8" (19 rows) which
generates an execution plan which has two "Nested Loop" steps instead of
"Hash Join" steps which makes the query finish after 90 seconds (and spiking
the server CPU when being ran by dozens of sessions simultaneously) instead
of ~3 seconds (when commenting out that specific WHERE filter clause I
mentioned below):

Query #1 (runtime: ~90 seconds)::

SELECT count(*) AS total FROM ( SELECT 1
FROM v_m
JOIN (SELECT *, generate_series(mv.start_year, mv.end_year) as year
FROM mv) t3
on v_m.m_id = t3.m_id
LEFT JOIN t9 ON t3.mm_id = t9.mm_id and t3.year = t9.year
LEFT JOIN v_mm ON t3.mm_id = v_mm.mm_id
LEFT JOIN t8 ON v_m.id = t8.mm_f_id
LEFT JOIN t7 ON t8.sm_f_id = t7.f_id
WHERE v_m.is_exists = true
AND t3.is_active = true
AND v_m.mk_id IN (7)
AND t3.year >= 2021
AND t3.year <= 2024
AND t8.sm_f_id IN (8) -- this specific clause slows down the query
GROUP BY v_m.m_id, v_m.m_h, t3.make, v_m.img, COALESCE(t9.is_new, false),
t3.year) T;

Execution plan #1:

Node Type    Entity    Cost    Rows    Time    Condition
Aggregate    [NULL]    44501.78 - 44501.79    1    [NULL]    [NULL]
Group    [NULL]    44501.48 - 44501.68    8    [NULL]    [NULL]
Incremental Sort    [NULL]    44501.48 - 44501.60    8    [NULL]    [NULL]
Nested Loop    [NULL]    29224.92 - 44501.36    8    [NULL]    [NULL]
Nested Loop    [NULL]    29224.92 - 44486.38    4    [NULL]    [NULL]
Nested Loop    [NULL]    2048.29 - 13628.25    1    [NULL]    [NULL]
Nested Loop    [NULL]    2047.87 - 13623.80    1    [NULL]    [NULL]
Nested Loop    [NULL]    1047.87 - 1060.89    1    [NULL]    COALESCE(((count(*)) > 0),
FALSE)
Index Scan    t1    0.27 - 8.29    1    [NULL]    (mk_id = 7)
Aggregate    [NULL]    1047.60 - 1049.60    200    [NULL]    [NULL]
Seq Scan    t2    0.00 - 993.28    10864    [NULL]    is_active
Subquery Scan    [NULL]    1000.00 - 12562.73    15    [NULL]    "((t3.year >= 2021)
AND (t3.year <= 2024))"
Gather    [NULL]    1000.00 - 12516.33    3093    [NULL]    [NULL]
ProjectSet    [NULL]    0.00 - 11207.03    1819000    [NULL]    [NULL]
Index Only Scan    t9    0.42 - 4.44    1    [NULL]    "((mm_id = t3.mm_id)
AND (YEAR = t3.year))"
Hash Join    [NULL]    27176.63 - 30848.49    771    [NULL]    (t4.f_id = t6.mm_f_id)
Unique    [NULL]    27175.38 - 28718.18    154280    [NULL]    [NULL]
Sort    [NULL]    27175.38 - 27561.08    154280    [NULL]    [NULL]
Append    [NULL]    16.07 - 3859.65    154280    [NULL]    [NULL]
Hash Join    [NULL]    16.07 - 773.18    38220    [NULL]    (t4.f_id = t7.f_id)
Hash Join    [NULL]    15.85 - 2315.07    116060    [NULL]    (t5.t_id = t.t_id)
Hash    [NULL]    1.24 - 1.24    1    [NULL]    [NULL]
Seq Scan    t8    0.00 - 1.24    1    [NULL]    (sm_f_id = 8)
Materialize    [NULL]    0.00 - 14.88    2    [NULL]    [NULL]
Seq Scan    t7    0.00 - 14.88    2    [NULL]    (f_id = 8)


Query #2 (runtime: ~3 seconds):

SELECT count(*) AS total FROM ( SELECT 1
FROM v_m
JOIN (SELECT *, generate_series(mv.start_year, mv.end_year) as year
FROM mv) t3
on v_m.m_id = t3.m_id
LEFT JOIN t9 ON t3.mm_id = t9.mm_id and t3.year = t9.year
LEFT JOIN v_mm ON t3.mm_id = v_mm.mm_id
LEFT JOIN t8 ON v_m.id = t8.mm_f_id
LEFT JOIN t7 ON t8.sm_f_id = t7.f_id
WHERE v_m.is_exists = true
AND t3.is_active = true
AND v_m.mk_id IN (7)
AND t3.year >= 2021
AND t3.year <= 2024
GROUP BY v_m.m_id, v_m.m_h, t3.make, v_m.img, COALESCE(t9.is_new, false),
t3.year) T;

Execution plan #2:
Node Type    Entity    Cost    Rows    Time    Condition
Aggregate    [NULL]    44501.78 - 44501.79    1    [NULL]    [NULL]
Group    [NULL]    44501.48 - 44501.68    8    [NULL]    [NULL]
Incremental Sort    [NULL]    44501.48 - 44501.60    8    [NULL]    [NULL]
Nested Loop    [NULL]    29224.92 - 44501.36    8    [NULL]    [NULL]
Nested Loop    [NULL]    29224.92 - 44486.38    4    [NULL]    [NULL]
Nested Loop    [NULL]    2048.29 - 13628.25    1    [NULL]    [NULL]
Nested Loop    [NULL]    2047.87 - 13623.80    1    [NULL]    [NULL]
Nested Loop    [NULL]    1047.87 - 1060.89    1    [NULL]    COALESCE(((count(*)) > 0),
FALSE)
Index Scan    t1    0.27 - 8.29    1    [NULL]    (mk_id = 7)
Aggregate    [NULL]    1047.60 - 1049.60    200    [NULL]    [NULL]
Seq Scan    t2    0.00 - 993.28    10864    [NULL]    is_active
Subquery Scan    [NULL]    1000.00 - 12562.73    15    [NULL]    "((t3.year >= 2021)
AND (t3.year <= 2024))"
Gather    [NULL]    1000.00 - 12516.33    3093    [NULL]    [NULL]
ProjectSet    [NULL]    0.00 - 11207.03    1819000    [NULL]    [NULL]
Index Only Scan    t9    0.42 - 4.44    1    [NULL]    "((mm_id = t3.mm_id)
AND (YEAR = t3.year))"
Hash Join    [NULL]    27176.63 - 30848.49    771    [NULL]    (t4.f_id = t6.mm_f_id)
Unique    [NULL]    27175.38 - 28718.18    154280    [NULL]    [NULL]
Sort    [NULL]    27175.38 - 27561.08    154280    [NULL]    [NULL]
Append    [NULL]    16.07 - 3859.65    154280    [NULL]    [NULL]
Hash Join    [NULL]    16.07 - 773.18    38220    [NULL]    (t4.f_id = t7.f_id)
Hash Join    [NULL]    15.85 - 2315.07    116060    [NULL]    (t5.t_id = t.t_id)
Hash    [NULL]    1.24 - 1.24    1    [NULL]    [NULL]
Seq Scan    t8    0.00 - 1.24    1    [NULL]    (sm_f_id = 8)
Materialize    [NULL]    0.00 - 14.88    2    [NULL]    [NULL]
Seq Scan    t7    0.00 - 14.88    2    [NULL]    (f_id = 8)


On 3/18/25 12:34, PG Bug reporting form wrote:
> Query has a specific WHERE filter for tiny lookup table "t8" (19 rows) which
> generates an execution plan which has two "Nested Loop" steps instead of
> "Hash Join" steps which makes the query finish after 90 seconds (and spiking
> the server CPU when being ran by dozens of sessions simultaneously) instead
> of ~3 seconds (when commenting out that specific WHERE filter clause I
> mentioned below):
May you provide EXPLAIN ANALYZE VERBOSE in standard text output format?

-- 
regards, Andrei Lepikhov



I would like to add that commenting out that WHERE clause doesn't work for all of this query variations so I've found another workaround for now ("LIMIT 1000000" in bold)

to make the optimizer better estimate the cardinality of the subquery and change the plan to a better one:

JOIN (SELECT *, generate_series(mv.start_year, mv.end_year) as year
FROM mv LIMIT 1000000) t3


This is the previous (suboptimal) plan in the format you asked for (with table/column names obfuscated):


Aggregate (cost=44501.78..44501.79 rows=1 width=8) (actual time=140784.531..140784.633 rows=1 loops=1)

Output: count(*)

-> Group (cost=44501.48..44501.68 rows=8 width=679) (actual time=4656.869..140784.563 rows=153 loops=1)

Output: NULL::integer, t1.***********, t1.***********, t3.***********, t1.***********, (COALESCE(p.is_new, false)), t3.year

Group Key: t1.***********, t3.***********, (COALESCE(p.is_new, false)), t3.year

-> Incremental Sort (cost=44501.48..44501.60 rows=8 width=675) (actual time=4656.866..140784.160 rows=1994 loops=1)

Output: t1.***********, t3.***********, (COALESCE(p.is_new, false)), t3.***********, t1.***********, t1.***********

Sort Key: t1.***********, t3.***********, (COALESCE(p.is_new, false)), t3.year

Presorted Key: t1.m_id

Full-sort Groups: 24 Sort Method: quicksort Average Memory: 30kB Peak Memory: 30kB

Pre-sorted Groups: 15 Sort Method: quicksort Average Memory: 34kB Peak Memory: 34kB

-> Nested Loop Left Join (cost=29224.92..44501.36 rows=8 width=675) (actual time=239.881..140777.492 rows=1994 loops=1)

Output: t1.***********, t3.***********, COALESCE(p.is_new, false), t3.***********, t1.***********, t1.***********

-> Nested Loop (cost=29224.92..44486.38 rows=4 width=679) (actual time=239.855..140771.693 rows=1994 loops=1)

Output: t1.***********, t1.***********, t1.***********, t3.***********, t3.***********, t9.***********, t8.***********

Join Filter: (t3.mm_id = t4.mm_id)

Rows Removed by Join Filter: 33142274

-> Nested Loop Left Join (cost=2048.29..13628.25 rows=1 width=679) (actual time=13.032..1942.680 rows=1994 loops=1)

Output: t1.***********, t1.***********, t1.***********, t3.***********, t3.***********, t3.***********, t9.***********

-> Nested Loop (cost=2047.87..13623.80 rows=1 width=678) (actual time=13.010..1906.470 rows=1650 loops=1)

Output: t1.***********, t1.***********, t1.***********, t3.***********, t3.***********, t3.***********

Join Filter: (t1.m_id = t3.m_id)

Rows Removed by Join Filter: 632234

-> Nested Loop Left Join (cost=1047.87..1060.89 rows=1 width=552) (actual time=11.398..19.784 rows=37 loops=1)

Output: t1.***********, t1.***********, t1.***********

Inner Unique: true

Join Filter: (t1.m_id = t2.m_id)

Rows Removed by Join Filter: 29232

Filter: COALESCE(((count(*)) > 0), false)

Rows Removed by Filter: 12

-> Index Scan using mds_mk_id_m_id_key on mds m (cost=0.27..8.29 rows=1 width=552) (actual time=0.023..0.223 rows=49 loops=1)

Output: t1.***********, t1.***********, t1.***********, t1.***********, t1.***********, t1.***********, t1.***********, t1.***********, t1.***********, t1.***********, t1.***********, t1.***********, t1.***********

Index Cond: (t1.mk_id = 7)

-> HashAggregate (cost=1047.60..1049.60 rows=200 width=12) (actual time=0.234..0.360 rows=597 loops=49)

Output: count(*), t2.m_id

Group Key: t2.m_id

Batches: 1 Memory Usage: 209kB

-> Seq Scan on t2 (cost=0.00..993.28 rows=10864 width=4) (actual time=0.040..7.570 rows=20717 loops=1)

Output: t2.***********, t2.***********, t2.***********, t2.***********, t2.***********, t2.***********, t2.***********, t2.***********, t2.***********, t2.***********, t2.***********, t2.***********, t2.***********, t2.***********, t2.***********, t2.***********, t2.***********, t2.***********, t2.***********, t2.***********, t2.***********, t2.***********, t2.***********, t2.***********, t2.***********, t2.***********, t2.***********, t2.***********, t2.***********

Filter: t2.is_active

Rows Removed by Filter: 11985

-> Subquery Scan on t3 (cost=1000.00..12562.73 rows=15 width=130) (actual time=1.405..49.852 rows=17132 loops=37)

Output: t3.***********, t3.***********, t3.***********, t3.***********, t3.***********, t3.***********, t3.***********, t3.***********, t3.***********, t3.***********, t3.***********, t3.***********, t3.***********, t3.***********, t3.***********, t3.***********, t3.***********, t3.***********, t3.***********, t3.***********, t3.***********, t3.***********, t3.***********, t3.***********, t3.***********, t3.***********, t3.***********, t3.***********, t3.***********, t3.***********, t3.***********, t3.***********, t3.***********, t3.***********, t3.***********, t3.***********, t3.***********, t3.***********, t3.***********, t3.***********, t3.***********, t3.***********, t3.***********, t3.***********, t3.***********, t3.***********, t3.***********, t3.***********, t3.***********, t3.***********, t3.***********, t3.***********, t3.***********, t3.***********

Filter: ((t3.year >= 2021) AND (t3.year <= 2024))

Rows Removed by Filter: 44378

-> Gather (cost=1000.00..12516.33 rows=3093 width=2335) (actual time=1.365..44.739 rows=61510 loops=37)

Output: NULL::timestamp without time zone, NULL::timestamp without time zone, mv.***********, NULL::text, NULL::integer, mv.***********, NULL::character varying(50), mv.***********, NULL::character varying(255), NULL::integer, NULL::character varying, NULL::integer, NULL::character varying(100), NULL::integer, NULL::character varying(50), NULL::integer, NULL::character varying(50), NULL::integer, NULL::character varying(50), NULL::integer, NULL::character varying(50), NULL::integer, NULL::integer, NULL::integer, NULL::character varying(50), NULL::integer, NULL::character varying(50), NULL::integer, NULL::character varying(50), NULL::integer, NULL::character varying(50), NULL::integer, NULL::text, NULL::text, NULL::integer, NULL::integer, NULL::integer, NULL::integer, NULL::boolean, NULL::boolean, NULL::boolean, NULL::boolean, NULL::integer[], NULL::text[], NULL::text[], NULL::text, NULL::text, NULL::boolean, NULL::integer, NULL::integer, NULL::integer, NULL::timestamp without time zone, NULL::timestamp without time zone, (generate_series(mv.s_year, mv.e_year))

Workers Planned: 1

Workers Launched: 1

-> ProjectSet (cost=0.00..11207.03 rows=1819000 width=2335) (actual time=0.034..25.766 rows=30755 loops=74)

Output: NULL::timestamp without time zone, NULL::timestamp without time zone, mv.***********, NULL::text, NULL::integer, mv.***********, NULL::character varying(50), mv.***********, NULL::character varying(255), NULL::integer, NULL::character varying, NULL::integer, NULL::character varying(100), NULL::integer, NULL::character varying(50), NULL::integer, NULL::character varying(50), NULL::integer, NULL::character varying(50), NULL::integer, NULL::character varying(50), NULL::integer, NULL::integer, NULL::integer, NULL::character varying(50), NULL::integer, NULL::character varying(50), NULL::integer, NULL::character varying(50), NULL::integer, NULL::character varying(50), NULL::integer, NULL::text, NULL::text, NULL::integer, NULL::integer, NULL::integer, NULL::integer, NULL::boolean, NULL::boolean, NULL::boolean, NULL::boolean, NULL::integer[], NULL::text[], NULL::text[], NULL::text, NULL::text, NULL::boolean, NULL::integer, NULL::integer, NULL::integer, NULL::timestamp without time zone, NULL::timestamp without time zone, generate_series(mv.s_year, mv.e_year)

Worker 0: actual time=0.019..33.874 rows=40139 loops=37

-> Parallel Seq Scan on mv (cost=0.00..2098.39 rows=1819 width=134) (actual time=0.029..10.233 rows=10358 loops=74)

Output: mv.***********, mv.***********, mv.***********, mv.***********, mv.***********, mv.***********, mv.***********, mv.***********, mv.***********, mv.***********, mv.***********, mv.***********, mv.***********, mv.***********, mv.***********, mv.***********, mv.***********, mv.***********, mv.***********, mv.***********, mv.***********, mv.***********, mv.***********, mv.***********, mv.***********, mv.***********, mv.***********, mv.***********, mv.***********, mv.***********, mv.***********, mv.***********, mv.***********, mv.***********, mv.***********, mv.***********, mv.***********, mv.***********, mv.***********, mv.***********, mv.***********, mv.***********, mv.***********, mv.***********, mv.***********, mv.***********, mv.***********, mv.***********, mv.***********, mv.***********, mv.***********, mv.***********, mv.***********

Filter: mv.is_active

Rows Removed by Filter: 5992

Worker 0: actual time=0.016..13.530 rows=13204 loops=37

-> Index Only Scan using t9_un on t9 (cost=0.42..4.44 rows=1 width=9) (actual time=0.014..0.018 rows=1 loops=1650)

Output: t9.***********, t9.***********, t9.***********

Index Cond: ((p.mm_id = t3.mm_id) AND (p.year = t3.year))

Heap Fetches: 0

-> Hash Join (cost=27176.63..30848.49 rows=771 width=8) (actual time=0.084..68.596 rows=16622 loops=1994)

Output: t4.***********, t8.sm_f_id

Hash Cond: (t4.f_id = t8.mm_f_id)

-> Unique (cost=27175.38..28718.18 rows=154280 width=126) (actual time=0.084..50.368 rows=166526 loops=1994)

Output: t4.***********, t4.***********, t7.***********

-> Sort (cost=27175.38..27561.08 rows=154280 width=126) (actual time=0.083..19.173 rows=166526 loops=1994)

Output: t4.***********, t4.***********, t7.***********

Sort Key: t4.***********, t4.***********, t7.***********

Sort Method: external sort Disk: 6912kB

-> Append (cost=16.07..3859.65 rows=154280 width=126) (actual time=0.025..67.960 rows=166526 loops=1)

-> Hash Left Join (cost=16.07..773.18 rows=38220 width=126) (actual time=0.024..12.886 rows=40136 loops=1)

Output: t4.***********, t4.***********, t7.***********

Inner Unique: true

Hash Cond: (t4.f_id = t7.f_id)

-> Seq Scan on t4 (cost=0.00..655.20 rows=38220 width=8) (actual time=0.005..3.407 rows=40136 loops=1)

Output: t4.***********, t4.***********, t4.***********, t4.***********, t4.***********, t4.***********, t4.***********

-> Hash (cost=12.70..12.70 rows=270 width=122) (actual time=0.013..0.014 rows=17 loops=1)

Output: t7.***********, t7.***********

Buckets: 1024 Batches: 1 Memory Usage: 9kB

-> Seq Scan on t7 (cost=0.00..12.70 rows=270 width=122) (actual time=0.004..0.008 rows=17 loops=1)

Output: t7.***********, t7.***********

-> Hash Left Join (cost=15.85..2315.07 rows=116060 width=126) (actual time=0.036..41.281 rows=126390 loops=1)

Output: t10.***********, t10.***********, t5.************

Inner Unique: true

Hash Cond: (t10.t_id = t5.t_id)

-> Seq Scan on t10 (cost=0.00..1989.60 rows=116060 width=8) (actual time=0.012..10.756 rows=126390 loops=1)

Output: t10.***********, t10.***********, t10.***********, t10.***********, t10.***********, t10.***********, t10.***********

-> Hash (cost=12.60..12.60 rows=260 width=122) (actual time=0.013..0.014 rows=16 loops=1)

Output: t5.***********, t5.***********

Buckets: 1024 Batches: 1 Memory Usage: 9kB

-> Seq Scan on t5 (cost=0.00..12.60 rows=260 width=122) (actual time=0.005..0.008 rows=16 loops=1)

Output: t5.***********, t5.***********

-> Hash (cost=1.24..1.24 rows=1 width=8) (actual time=0.011..0.011 rows=1 loops=1)

Output: t8.***********, t8.***********

Buckets: 1024 Batches: 1 Memory Usage: 9kB

-> Seq Scan on t8 (cost=0.00..1.24 rows=1 width=8) (actual time=0.008..0.008 rows=1 loops=1)

Output: t8.***********, t8.***********

Filter: (t8.sm_f_id = 8)

Rows Removed by Filter: 18

-> Materialize (cost=0.00..14.88 rows=2 width=4) (actual time=0.001..0.001 rows=1 loops=1994)

Output: t7.***********

-> Seq Scan on t7 (cost=0.00..14.88 rows=2 width=4) (actual time=0.017..0.019 rows=1 loops=1)

Output: t7.***********

Filter: (t7.f_id = 8)

Rows Removed by Filter: 9

Query Identifier: -4992889262453896709

Planning Time: 1.762 ms

Execution Time: 140786.468 ms


‫בתאריך יום ג׳, 18 במרץ 2025 ב-16:16 מאת ‪Andrei Lepikhov‬‏ <‪lepihov@gmail.com‬‏>:‬
On 3/18/25 12:34, PG Bug reporting form wrote:
> Query has a specific WHERE filter for tiny lookup table "t8" (19 rows) which
> generates an execution plan which has two "Nested Loop" steps instead of
> "Hash Join" steps which makes the query finish after 90 seconds (and spiking
> the server CPU when being ran by dozens of sessions simultaneously) instead
> of ~3 seconds (when commenting out that specific WHERE filter clause I
> mentioned below):
May you provide EXPLAIN ANALYZE VERBOSE in standard text output format?

--
regards, Andrei Lepikhov


--
בברכה,
יוני שדה
On 19/3/2025 14:34, Yoni Sade wrote:
> I would like to add that commenting out that WHERE clause doesn't work 
> for all of this query variations so I've found another workaround for 
> now ("LIMIT 1000000" in *bold*)
> 
> to make the optimizer better estimate the cardinality of the subquery 
> and change the plan to a better one:*
I wouldn't say it is the ultimate root of the problem, but using 
generate_series, especially with non-constant parameters, seems harmful 
for the query planning: we can't predict the number of tuples and don't 
have any statistics on that set. So, your LIMIT works like a hint that 
helps decide how massive a set of tuples it will generate and seems to 
be the only option possible.

-- 
regards, Andrei Lepikhov