Thread: BUG #15604: NOT IN condition incorrectly returns False

BUG #15604: NOT IN condition incorrectly returns False

From
PG Bug reporting form
Date:
The following bug has been logged on the website:

Bug reference:      15604
Logged by:          Sergey Romanovsky
Email address:      sergey@romanovsky.org
PostgreSQL version: 10.6
Operating system:   linux Red Hat 4.8.3-9
Description:

Zhijiang Li <zl256@cornell.edu> and I found the following bug described
here: https://github.com/romanovsky/postgres/blob/master/README.md
# Postgres bug: NOT IN condition incorrectly returns False

## Short description

In case the dataset doesn't fit into available memory and Postgres has
decided to use hash lookup for `NOT IN`
we observe that both queries `1 IN SubSelect` and `1 NOT IN SubSelect`
return empty result set.
If you upgrade AWS RDS instance from `db.r4.large` to `db.r4.xlarge` you
won't be able to reproduce the problem.

## Assumptions made
* Examples below assume that there's no request_id=1:
```sql
db=> SELECT request_id FROM postgres_not_in_bug WHERE request_id=1;
 request_id
------------
(0 rows)
```

* Execution plan looks like:
```sql
db=> EXPLAIN SELECT 'there is no 1' AS foo WHERE 1 NOT IN (SELECT request_id
FROM postgres_not_in_bug);
                                    QUERY PLAN
-----------------------------------------------------------------------------------
 Result  (cost=2101.68..2101.69 rows=1 width=32)
   One-Time Filter: (NOT (hashed SubPlan 1))
   SubPlan 1
     ->  Seq Scan on postgres_not_in_bug  (cost=0.00..1791.34 rows=124134
width=6)
(4 rows)
```

## Observed behavior
```sql
db=> SELECT 'there is no 1' AS foo WHERE 1 NOT IN (SELECT request_id FROM
postgres_not_in_bug);
 foo
-----
(0 rows)

db=> SELECT 'there is no 1' as foo WHERE 1 IN (SELECT request_id FROM
postgres_not_in_bug);
 foo
-----
(0 rows)
```

## Expected behavior
```sql
db=> SELECT 'there is no 1' AS foo WHERE 1 NOT IN (SELECT request_id FROM
postgres_not_in_bug);
      foo
---------------
 there is no 1
(1 row)

db=> SELECT 'there is no 1' AS foo WHERE 1 IN (SELECT request_id FROM
postgres_not_in_bug);
       foo
-----------------
(0 rows)
```


## How to reproduce

1. Download `postgres_not_in_bug.pg_dump` from this repo locally to
`/tmp/postgres_not_in_bug.pg_dump`
```bash
curl -H 'Accept: application/vnd.github.v3.raw' -o
/tmp/postgres_not_in_bug.pg_dump -O -L
https://raw.githubusercontent.com/romanovsky/postgres/master/postgres_not_in_bug.pg_dump
```
2. Load the dataset to postgres
```bash
$ cat /tmp/postgres_not_in_bug.pg_dump|psql -h host -U user -d db
--port=5432
SET
SET
SET
SET
SET
 set_config
------------

(1 row)

SET
SET
SET
SET
SET
CREATE TABLE
COPY 124134
```
3. Run queries:
```bash
$ psql -h host -U user -d db --port=5432 --command="SELECT 'there is no 1'
AS foo WHERE 1 NOT IN (SELECT request_id FROM postgres_not_in_bug)"
 foo
-----
(0 rows)

$ psql -h host -U user -d db --port=5432 --command="SELECT 'there is no 1'
AS foo WHERE 1 IN (SELECT request_id FROM postgres_not_in_bug)"
 foo
-----
(0 rows)
```

### Specifications
* Postgres 10.6
```bash
$ psql -h host -U user -d db --port=5432 --command="SELECT version()"
                  
                                                version
--------------------------------------------------------------------------------------------------------
 PostgreSQL 10.6 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.3
20140911 (Red Hat 4.8.3-9), 64-bit
(1 row)
```
* Amazon RDS db.r4.large, i.e. 2 vCPU/15.25GB RAM (see more details here:
https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Concepts.DBInstanceClass.html#Concepts.DBInstanceClass.Summary)


Re: BUG #15604: NOT IN condition incorrectly returns False

From
Andrew Gierth
Date:
>>>>> "PG" == PG Bug reporting form <noreply@postgresql.org> writes:

 PG> The following bug has been logged on the website:
 PG> Bug reference:      15604
 PG> Logged by:          Sergey Romanovsky
 PG> Email address:      sergey@romanovsky.org
 PG> PostgreSQL version: 10.6
 PG> Operating system:   linux Red Hat 4.8.3-9
 PG> Description:        

 PG> Zhijiang Li <zl256@cornell.edu> and I found the following bug described
 PG> here: https://github.com/romanovsky/postgres/blob/master/README.md
 PG> # Postgres bug: NOT IN condition incorrectly returns False

Not a bug. This is actually how NOT IN is supposed to work, and it has
nothing to do with hash lookups or instance size (the output is the same
whether a hashed or non-hashed plan is used).

Here is why:

select count(*) from postgres_not_in_bug where request_id is null;
 count 
-------
     1
(1 row)

The condition  1 NOT IN (2,NULL)  is equivalent to (1=2) OR (1=NULL),
which evaluates to (false) OR (NULL) which in turn evaluates to NULL.
Since this is not TRUE, the WHERE clause does not accept the row.

When you do NOT IN (select col ...) then the null handling is the same;
if there is _any_ null value in the selected data, then the NOT IN will
never return TRUE (only FALSE or NULL according to whether the value is
found or not).

See also https://wiki.postgresql.org/wiki/Don't_Do_This#Don.27t_use_NOT_IN

-- 
Andrew (irc:RhodiumToad)


Re: BUG #15604: NOT IN condition incorrectly returns False

From
Andrew Gierth
Date:
>>>>> "Andrew" == Andrew Gierth <andrew@tao11.riddles.org.uk> writes:

 Andrew> The condition  1 NOT IN (2,NULL)  is equivalent to (1=2) OR
 Andrew> (1=NULL),

I mean NOT ((1=2) OR (1=NULL)) of course, but NOT (NULL) is still NULL.

-- 
Andrew (irc:RhodiumToad)


Re: BUG #15604: NOT IN condition incorrectly returns False

From
Sergey Romanovsky
Date:
Hi Andrew,
Thanks a lot for fast and informative response!
You're totally right.
We both feel embarrassed and amused at the same time. This is so nice of you to educate us.


On Tue, Jan 22, 2019 at 10:57 PM Andrew Gierth <andrew@tao11.riddles.org.uk> wrote:
>>>>> "PG" == PG Bug reporting form <noreply@postgresql.org> writes:

 PG> The following bug has been logged on the website:
 PG> Bug reference:      15604
 PG> Logged by:          Sergey Romanovsky
 PG> Email address:      sergey@romanovsky.org
 PG> PostgreSQL version: 10.6
 PG> Operating system:   linux Red Hat 4.8.3-9
 PG> Description:       

 PG> Zhijiang Li <zl256@cornell.edu> and I found the following bug described
 PG> here: https://github.com/romanovsky/postgres/blob/master/README.md
 PG> # Postgres bug: NOT IN condition incorrectly returns False

Not a bug. This is actually how NOT IN is supposed to work, and it has
nothing to do with hash lookups or instance size (the output is the same
whether a hashed or non-hashed plan is used).

Here is why:

select count(*) from postgres_not_in_bug where request_id is null;
 count
-------
     1
(1 row)

The condition  1 NOT IN (2,NULL)  is equivalent to (1=2) OR (1=NULL),
which evaluates to (false) OR (NULL) which in turn evaluates to NULL.
Since this is not TRUE, the WHERE clause does not accept the row.

When you do NOT IN (select col ...) then the null handling is the same;
if there is _any_ null value in the selected data, then the NOT IN will
never return TRUE (only FALSE or NULL according to whether the value is
found or not).

See also https://wiki.postgresql.org/wiki/Don't_Do_This#Don.27t_use_NOT_IN

--
Andrew (irc:RhodiumToad)