BUG #18234: Nested Loop joint strategy is ignored for a tiny table joined with UNION ALL of two filtered parts - Mailing list pgsql-bugs

From PG Bug reporting form
Subject BUG #18234: Nested Loop joint strategy is ignored for a tiny table joined with UNION ALL of two filtered parts
Date
Msg-id 18234-571586e70bd0658d@postgresql.org
Whole thread Raw
Responses Re: BUG #18234: Nested Loop joint strategy is ignored for a tiny table joined with UNION ALL of two filtered parts
List pgsql-bugs
The following bug has been logged on the website:

Bug reference:      18234
Logged by:          Dmitry Astapov
Email address:      dastapov@gmail.com
PostgreSQL version: 15.5
Operating system:   Rocky Linux
Description:

Summary of the issue: for a (5-row recordset) JOIN (massive partitioned
recordset indexed by id) USING (id), the (Nested Loop over 5 values)
strategy is completely ignored, and Hash Join or Merge Join is done instead,
which does SeqScan over the "massive recordset".

Verified with PostgreSQL versions 13.13, 14.10, 15.5

Reproduction in DB Fiddle:
https://www.db-fiddle.com/f/sJUUWNgW7pqPWcJwihVoj5/1 (this demonstrates both
the bad behaviour and a way to work around it)

Plan for the badly-behaving query (pulled out of DB Fiddle):

    explain select * from tiny join vw_broken on tiny.id = vw_broken.id;

QUERY PLAN
                
---------------------------------------------------------------------------------------------
Hash Join  (cost=1.11..3798.30 rows=712 width=16)
                
  Hash Cond: (huge.id = tiny.id)
                
  ->  Append  (cost=0.00..3683.32 rows=28466 width=8)
                
        ->  Append  (cost=0.00..3222.91 rows=28181 width=8)
                
              ->  Seq Scan on huge huge_1  (cost=0.00..0.00 rows=1 width=8)
                
                    Filter: filter_out
                
              ->  Seq Scan on huge_partition1 huge_2  (cost=0.00..1541.00
rows=13987 width=8)
                    Filter: filter_out
                
              ->  Seq Scan on huge_partition2 huge_3  (cost=0.00..1541.00
rows=14193 width=8)
                    Filter: filter_out
                
        ->  Append  (cost=0.00..33.42 rows=285 width=8)
                
              ->  Seq Scan on medium medium_1  (cost=0.00..0.00 rows=1
width=8)              
                    Filter: filter_out
                
              ->  Seq Scan on medium_partition1 medium_2  (cost=0.00..16.00
rows=142 width=8)
                    Filter: filter_out
                
              ->  Seq Scan on medium_partition2 medium_3  (cost=0.00..16.00
rows=142 width=8)
                    Filter: filter_out
                
  ->  Hash  (cost=1.05..1.05 rows=5 width=8)
                
        ->  Seq Scan on tiny  (cost=0.00..1.05 rows=5 width=8)
                

I think that there are two key components for triggering the bad
behaviour:

1)The massive recordset on the right side of the JOIN must come from the
UNION ALL of two parts, both of which have a filter, like this view in my
reproduction:

create view vw_broken as
  select id from huge where filter_out
  union all
  select id from medium where filter_out;

2)Tables being UNION ALL'ed should be inheritance parents (aka old-style
partitioned tables). I was unable to reproduce with regular tables, though I
did not try very hard.

The fix is to move the filter condition out of UNION ALL:
create view vw_not_broken as
  select id,filter_out from (
    select id,filter_out from huge
    union all
    select id,filter_out from medium
  ) q
  where filter_out;

What adds insult to injury is that the optimizer seems to be happily pushing
the "where filter_out" condition back into the UNION ALL and down the
inheritance hierarchy (which is exactly what the "broken" view is doing as
well, seemingly):

 explain select * from tiny join vw_not_broken on tiny.id =
vw_not_broken.id;

QUERY PLAN
                                        
---------------------------------------------------------------------------------------------------------------------
Nested Loop  (cost=0.00..151.53 rows=712 width=17)
                                        
  ->  Seq Scan on tiny  (cost=0.00..1.05 rows=5 width=8)
                                        
  ->  Append  (cost=0.00..30.04 rows=6 width=9)
                                        
        ->  Seq Scan on huge  (cost=0.00..0.00 rows=1 width=9)
                                        
              Filter: (filter_out AND (tiny.id = id))
                                        
        ->  Index Scan using huge_partition1_id_idx on huge_partition1
huge_1  (cost=0.29..8.31 rows=1 width=9)      
              Index Cond: (id = tiny.id)
                                        
              Filter: filter_out
                                        
        ->  Index Scan using huge_partition2_id_idx on huge_partition2
huge_2  (cost=0.29..8.31 rows=1 width=9)      
              Index Cond: (id = tiny.id)
                                        
              Filter: filter_out
                                        
        ->  Seq Scan on medium  (cost=0.00..0.00 rows=1 width=9)
                                        
              Filter: (filter_out AND (tiny.id = id))
                                        
        ->  Index Scan using medium_partition1_id_idx on medium_partition1
medium_1  (cost=0.28..6.69 rows=1 width=9)
              Index Cond: (id = tiny.id)
                                        
              Filter: filter_out
                                        
        ->  Index Scan using medium_partition2_id_idx on medium_partition2
medium_2  (cost=0.28..6.69 rows=1 width=9)
              Index Cond: (id = tiny.id)
                                        
              Filter: filter_out
                                        

I was unable to get a "good" plan out of the "bad" view by tweaking enable_
settings, so I suspect that the indexed access path is not considered at
all, for some reason.

I don't think that this is a recently introduced issue, as this email thread
from 2015 seems to be talking about the same issue:
https://www.postgresql.org/message-id/20150521104103.GB10049@pax.zz.de
(sadly, thread petered out without a reproduction / more info from the
reporter).

I'd love to do more debugging and get to the bottom of this. If you have a
hunch or suspicion about the possible origin of the bad behaviour but have
no time to check it yourself - can you please point me to a suspect part of
the optimizer so I can try to pinpoint it?


pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: BUG #18233: coerce_type accepts NULL as pstate and can pass it to coerce_record_to_complex
Next
From: Tom Lane
Date:
Subject: Re: BUG #18234: Nested Loop joint strategy is ignored for a tiny table joined with UNION ALL of two filtered parts