BUG #17545: Incorrect selectivity for IS NOT DISTINCT FROM and NULLs - Mailing list pgsql-bugs

From PG Bug reporting form
Subject BUG #17545: Incorrect selectivity for IS NOT DISTINCT FROM and NULLs
Date
Msg-id 17545-a0ca4de888953169@postgresql.org
Whole thread Raw
Responses Re: BUG #17545: Incorrect selectivity for IS NOT DISTINCT FROM and NULLs
List pgsql-bugs
The following bug has been logged on the website:

Bug reference:      17545
Logged by:          Roope Salmi
Email address:      rpsalmi@gmail.com
PostgreSQL version: 14.4
Operating system:   Ubuntu 22.04
Description:

Hi,

It appears that selectivity for "IS NOT DISTINCT FROM" clauses regard the
proportion of NULLs in the same way as "=", leading to polar opposite row
count
estimates.

In practice this causes nested loop joins to be used when that is not
appropriate. Not sure if this is a known issue, but I was unable to find
any
previous mentions of it.

Here I create a table with one column and 1000 rows of NULL. A cartesian
product
with "WHERE x.a = y.a" correctly estimates that there are around zero
matching
rows. "x.a IS NOT DISTINCT FROM y.a" incorrectly gives the same estimate,
whereas "x.a = y.a OR (x.a IS NULL AND y.a IS NULL)", which should be
equivalent, gives the correct 1000000.

postgres=# CREATE TABLE test(a INTEGER);
CREATE TABLE
postgres=# INSERT INTO test(a) SELECT NULL FROM generate_series(1, 1000);
INSERT 0 1000
postgres=# ANALYZE test;
ANALYZE
postgres=# EXPLAIN SELECT FROM test x, test y WHERE x.a = y.a;
                              QUERY PLAN                              
----------------------------------------------------------------------
 Merge Join  (cost=127.66..137.67 rows=1 width=0)
   Merge Cond: (x.a = y.a)
   ->  Sort  (cost=63.83..66.33 rows=1000 width=4)
         Sort Key: x.a
         ->  Seq Scan on test x  (cost=0.00..14.00 rows=1000 width=4)
   ->  Sort  (cost=63.83..66.33 rows=1000 width=4)
         Sort Key: y.a
         ->  Seq Scan on test y  (cost=0.00..14.00 rows=1000 width=4)
(8 rows)

postgres=# EXPLAIN SELECT FROM test x, test y
postgres=# WHERE x.a IS NOT DISTINCT FROM y.a;
                              QUERY PLAN                              
----------------------------------------------------------------------
 Nested Loop  (cost=0.00..15030.50 rows=1 width=0)
   Join Filter: (NOT (x.a IS DISTINCT FROM y.a))
   ->  Seq Scan on test x  (cost=0.00..14.00 rows=1000 width=4)
   ->  Materialize  (cost=0.00..19.00 rows=1000 width=4)
         ->  Seq Scan on test y  (cost=0.00..14.00 rows=1000 width=4)
(5 rows)

postgres=# EXPLAIN SELECT FROM test x, test y
    
postgres-# WHERE x.a = y.a OR (x.a IS NULL AND y.a IS NULL);
                              QUERY PLAN                              
----------------------------------------------------------------------
 Nested Loop  (cost=0.00..15030.50 rows=1000000 width=0)
   Join Filter: ((x.a = y.a) OR ((x.a IS NULL) AND (y.a IS NULL)))
   ->  Seq Scan on test x  (cost=0.00..14.00 rows=1000 width=4)
   ->  Materialize  (cost=0.00..19.00 rows=1000 width=4)
         ->  Seq Scan on test y  (cost=0.00..14.00 rows=1000 width=4)
(5 rows)


pgsql-bugs by date:

Previous
From: Jordan Lewis
Date:
Subject: Re: BUG #17542: tsquery returns incorrect results with nested, conjuncted followed-by operators
Next
From: PG Bug reporting form
Date:
Subject: BUG #17546: power() function - value is distorted via automatic type cast