Re: BUG #17978: Unexpected error: "wrong varnullingrels (b) (expected (b 5)) for Var 6/2" triggered by JOIN - Mailing list pgsql-bugs

From Tom Lane
Subject Re: BUG #17978: Unexpected error: "wrong varnullingrels (b) (expected (b 5)) for Var 6/2" triggered by JOIN
Date
Msg-id 3153450.1687208355@sss.pgh.pa.us
Whole thread Raw
In response to Re: BUG #17978: Unexpected error: "wrong varnullingrels (b) (expected (b 5)) for Var 6/2" triggered by JOIN  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: BUG #17978: Unexpected error: "wrong varnullingrels (b) (expected (b 5)) for Var 6/2" triggered by JOIN  (Richard Guo <guofenglinux@gmail.com>)
List pgsql-bugs
I wrote:
> Anyway, what I'm inclined to do is flesh out the attached by updating
> the comments for remove_useless_results() and then push it.  Later
> on we can look for why it's not finding the better join order; that's
> a separable issue, and if it is about avoid-clauseless-joins then we
> might choose to live with it rather than incur a lot of planner cost
> to fix it.

I couldn't resist poking into that, and it seems there's less here
than meets the eye.  I'd been guessing that the presence or absence
of a t2 reference in the WHERE clause was affecting this, but no: the
SpecialJoinInfos look exactly the same for both queries, and the set
of joins considered is the same in both.  What is causing the
different plan shape is that the selectivity estimates for these
WHERE clauses are a lot different:

regression=# explain
regression-# select t2.f1 from int4_tbl t2
regression-#                 left join int4_tbl t3 on t2.f1 > 0
regression-#                 where t3.f1 is null;
                              QUERY PLAN                               
-----------------------------------------------------------------------
 Nested Loop Left Join  (cost=0.00..2.49 rows=1 width=4)
   Join Filter: (t2.f1 > 0)
   Filter: (t3.f1 IS NULL)
   ->  Seq Scan on int4_tbl t2  (cost=0.00..1.05 rows=5 width=4)
   ->  Materialize  (cost=0.00..1.07 rows=5 width=4)
         ->  Seq Scan on int4_tbl t3  (cost=0.00..1.05 rows=5 width=4)
(6 rows)

regression=# explain
regression-# select t2.f1 from int4_tbl t2
regression-#                 left join int4_tbl t3 on t2.f1 > 0
regression-#                 where t2.f1 <> coalesce(t3.f1, -1);
                              QUERY PLAN                               
-----------------------------------------------------------------------
 Nested Loop Left Join  (cost=0.00..2.55 rows=8 width=4)
   Join Filter: (t2.f1 > 0)
   Filter: (t2.f1 <> COALESCE(t3.f1, '-1'::integer))
   ->  Seq Scan on int4_tbl t2  (cost=0.00..1.05 rows=5 width=4)
   ->  Materialize  (cost=0.00..1.07 rows=5 width=4)
         ->  Seq Scan on int4_tbl t3  (cost=0.00..1.05 rows=5 width=4)

and that ends up with the other join order looking better.  We can
synthesize a different non-strict, t3-only qual with a similar
selectivity estimate:

regression=# explain
regression-# select t2.f1 from int4_tbl t2
regression-#                 left join int4_tbl t3 on t2.f1 > 0
regression-#                 where    -1 <> coalesce(t3.f1, -1);
                              QUERY PLAN                               
-----------------------------------------------------------------------
 Nested Loop Left Join  (cost=0.00..2.55 rows=8 width=4)
   Join Filter: (t2.f1 > 0)
   Filter: ('-1'::integer <> COALESCE(t3.f1, '-1'::integer))
   ->  Seq Scan on int4_tbl t2  (cost=0.00..1.05 rows=5 width=4)
   ->  Materialize  (cost=0.00..1.07 rows=5 width=4)
         ->  Seq Scan on int4_tbl t3  (cost=0.00..1.05 rows=5 width=4)

and then if you plug that into the whole query you get a t4-last plan:

regression=# explain (costs off)
regression-# select * from int4_tbl t1
regression-#   left join ((select t2.f1 from int4_tbl t2
regression(#                 left join int4_tbl t3 on t2.f1 > 0
regression(#                 where    -1 <> coalesce(t3.f1, -1)   ) s
regression(#              left join tenk1 t4 on s.f1 > 1)
regression-#     on s.f1 = t1.f1;
                               QUERY PLAN                                
-------------------------------------------------------------------------
 Nested Loop Left Join
   Join Filter: (t2.f1 > 1)
   ->  Hash Right Join
         Hash Cond: (t2.f1 = t1.f1)
         ->  Nested Loop Left Join
               Join Filter: (t2.f1 > 0)
               Filter: ('-1'::integer <> COALESCE(t3.f1, '-1'::integer))
               ->  Seq Scan on int4_tbl t2
               ->  Materialize
                     ->  Seq Scan on int4_tbl t3
         ->  Hash
               ->  Seq Scan on int4_tbl t1
   ->  Materialize
         ->  Seq Scan on tenk1 t4
(14 rows)

So, nothing to see here after all.

            regards, tom lane



pgsql-bugs by date:

Previous
From: Michael Guissine
Date:
Subject: Re: BUG #17974: Walsenders memory usage suddenly spike to 80G+ causing OOM and server reboot
Next
From: Thomas Munro
Date:
Subject: Re: BUG #17949: Adding an index introduces serialisation anomalies.